sync(master): absorb 544 upstream commits — per-arch refactor + Gemma4 12B by marksverdhei · Pull Request #59 · heiervang-technologies/ht-llama.cpp

marksverdhei · 2026-06-04T10:44:00Z

Why

Markus asked for ht-on-top-of-master so Gemma 4 12B (`Gemma4ForCausalLM`) is runnable.

That architecture is a text-only variant added upstream in PR #23682 (commit `dbe9c0c8c`) and registered in `conversion/gemma.py:617`. It rides along with the larger per-arch model refactor (commit `994118a18`) that moved `llama_model::build_graph()` out of one monolithic switch and into 132 per-arch classes under `src/models/`.

Before this merge, neither change was on `origin/ht`. After this merge, both are.

What

`origin/ht` + `origin/master` (FF'd to upstream `006640408` earlier today) → merge commit `df61855b7`.

544 commits absorbed from upstream.
151 ht-only commits preserved as-is (TurboQ, multi-model router, presets UI, etc.).
985 files in the merge commit.

Conflict resolution

427 webui rename/delete conflicts (master renamed `tools/server/webui/` → `tools/ui/`; ht deleted them in chore: remove embedded webui source code, replace with heierchat pointer #54): all resolved as `rm` — heierchat owns the webui.
3 `tools/server/public/bundle.*` UD conflicts: kept ht's deployed heierchat bundle.
`.github/workflows/ui.yml` (master modified, ht deleted): kept deletion.
`convert_hf_to_gguf.py` (ht monolithic vs master's wrapper into `conversion/` package): took master's wrapper. Ht-only attributes `_is_ocr_config` (HunyuanOCR detection) and `_is_vision_tensor` (LFM2 vision filter) are dropped — master's modular equivalents handle these via different code paths (see `conversion/hunyuan.py` line 249 and `conversion/lfm2.py` line 183). Behavior may diverge on edge cases; flagged below.
`.gitignore`, `ggml/src/ggml-cpu/arch-fallback.h`, 4 READMEs: trivial textual merges (master's flag renames + re-added ht's TBQ entries).
`tests/test-backend-ops.cpp`: master added a new `ne_dst` parameter to `test_cpy`'s constructor signature; updated ht's TurboQ + f16/bf16 test cases to use the new signature (`{-1,-1,-1,-1}` for `ne_dst`).
`tools/server/server-chat.cpp` (1 conflict): kept master's `SRV_WRN` logging for non-function Responses tools, kept ht's "skip instead of reject" intent.
`tools/server/server-models.cpp` (2 conflicts):
- `log_available_models` + `apply_stop_timeout`: kept master's lambda refactor, inserted ht's discovered-adapters print block inside `log_available_models`.
- `wait_until_ready` / `wait_until_unloaded` (from feat(server): wait for load and unload operations to complete #55) + `update_loaded_info` (from master): kept all three methods.

Verified

✅ `cmake -B build -DGGML_CPU=ON -DLLAMA_BUILD_APP=ON` configures clean (no extra cache-clear needed)
✅ `cmake --build build --target llama-server` succeeds end-to-end (100%, no warnings on touched paths)
✅ `llama-server --help` runs, shows the new upstream flags (e.g. `--vision-gemma-12b-default`)
✅ `conversion/gemma.py:617` registers `Gemma4ForCausalLM` — 12B conversion path present
✅ `src/models/gemma4.cpp` present (132 files in `src/models/` now)

Known follow-ups

Gemma 4 12B layer-count case: `src/models/gemma4.cpp:23-27` still only maps `{30, 35, 42, 60}` → `{LLM_TYPE_26B_A4B, LLM_TYPE_E2B, LLM_TYPE_E4B, LLM_TYPE_31B}`. 12B's actual layer count isn't in upstream's switch either — falls to `LLM_TYPE_UNKNOWN`. Model still loads functionally; just logs "unknown" instead of "12B". Add a case once we run an actual 12B GGUF and read its `n_layer`.
Conversion behavior for Hunyuan-OCR / LFM2-Audio: master's per-arch implementations differ from ht's previous monolithic checks. Worth a spot-test if those models are still in active rotation.
DFlash branch (PR feat(dflash): complete DFlash speculative decoding integration #53): held per Markus's instruction. `feat/dflash-integration` is unaffected by this merge; it'll need its own rebase onto this new ht + a port to the per-arch class hierarchy.
CUDA build: this verification was CPU-only. Titan deploy will need `-DGGML_CUDA=ON` build.

* vulkan: optimize operations in the IM2COL shader * Add comments and improve the code formatting

…refactor (ggml-org#23345) * mtmd : deepseek-ocr fixes, improvements and refactoring - image processing changes to achieve full parity with Pillow (reference impl) - SAM mask casting only when flash-attn is on - SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it) - llama-chat changes to fix server/WebUI issue (new media_markers_first()) - adapted test-chat-template and added test cases for deepseek-ocr - changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model - ty.toml ignore unresolved-import for tools/mtmd/tests/** * image-text reordering fix removed * refactor bool add_padding + pad_rounding enum into a single pad_style enum

…3386) ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi

* opencl: refactor initialization * opencl: refactor GPU identification * opencl: rename for consistency * opencl: cache global mem size in dev_ctx * opencl: adjust log level * opencl: load argsort and flash_attn kernels in supports_op * argsort kernel must be built for supports_op for querying the max workgroups * flash_attn kernel has many variants, only load them when needed

* Move to backend sampling for MTP draft path Run top_k(10) on the draft backend. D2H transfers happen only for the top 10 logits Make backend sampling more robust and fallback to CPU on failure cases, such as with "-sm tensor" or when a backend doesn't support TOP_K. * Allow sampler chains to be partially offloaded to backend * Add --spec-draft-backend-sampling argument. Enabled by default.

* webui: Add max image size option * remove magic numbers * support all image formats * use const * Move regex to match b64 images to constants * use SETTINGS_KEYS to get max image resolution setting * Do not touch the image if already under the size threshold

…ision (ggml-org#23329) - HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. - Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* hexagon: remove gathers and better handling of vtcm in ssm-conv * hexagon: relax ssm-conv gating requirements * hexagon: add new prefill ssm-conv backend test * hexagon: remove trailing white space * hex-rope: uninline rope_cache_init, otherwise it breaks after rebaseing with SSM_CONV changes --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>

…r SWA-only models (ggml-org#23131) When a model has zero non-SWA attention layers (e.g. a SWA-only slice of Gemma 4), the base KV cache has no layer tensors. The input tensors (self_k_idxs, self_v_idxs, self_kq_mask) are created as graph input nodes but never consumed by any compute node, so the backend scheduler never allocates a buffer for them. Calling mctx->get_base()->set_input_k_idxs() on an unallocated tensor then hits GGML_ASSERT(buffer) at ggml-backend.cpp:194. The same scenario applies symmetrically: if a model had zero SWA layers, the SWA tensors would be unallocated. Fix: guard both the base and SWA set_input calls with null/buffer checks, matching the pattern already used by llm_graph_input_mem_hybrid_iswa::set_input (line ~674) which has the comment: 'base tensors may not be allocated if there are no non-SWA attention layers'. Also fix can_reuse() in the same class to skip the ne[0] and kq_mask checks for unallocated tensors, preventing a null-dereference on the reuse path.

…gml-org#23306) Probably no backends implement only one of 2d get/set, but this might be annoying for some future backend developer trying to add 2d get/set.

* refactor: Improve Git Hooks for UI development * fix: Address review comments * fix: Use absolute git path for `/hooks` Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>

* vocab : add Carbon-3B (HybridDNATokenizer) support Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}. The base BPE is Qwen3-4B-Base's; what differs is that text inside <dna>...</dna> regions is chunked into fixed 6-mers (right-padded with 'A' on the trailing partial), and any base outside ACGT maps to <oov>. * src/llama-vocab.{h,cpp}: new pre-type, dispatched from llm_tokenizer_bpe_session::tokenize. * src/llama-vocab-carbon.h: pure helpers (tokenize_carbon, emit_dna_kmers) factored out for unit testing — no llama_vocab dependency, vocab access goes through a std::function. * conversion/base.py: detect HybridDNATokenizer by class name in get_vocab_base_pre (chktxt collides with Qwen3 base since it has no <dna>), and pass trust_remote_code=True in get_vocab_base so the custom tokenizer class can load. * tests/test-tokenizer-carbon.cpp: 12 cases covering single 6-mer, multi 6-mer, lowercase, invalid base -> <oov>, partial k-mer right-pad, mixed text+DNA, empty <dna></dna>, unterminated <dna>, two regions, vocab miss. * vocab : align Carbon-3B changes with llama.cpp conventions * Fold tokenize_carbon + emit_dna_kmers inline into llm_tokenizer_bpe_session (drop src/llama-vocab-carbon.h), matching how every other tokenizer keeps its helpers inside llama-vocab.cpp. * Replace the standalone unit test with the conventional test-tokenizer-0 row backed by models/ggml-vocab-carbon.gguf (vocab-only conversion) + .inp/.out fixtures covering single 6-mer, multi 6-mer, lowercase, invalid base -> <oov>, partial right-pad, mixed text+DNA, empty <dna></dna>, unterminated <dna>, two regions. * Register "carbon" in convert_hf_to_gguf_update.py's model list (pointing at HuggingFaceBio/Carbon-3B) and teach both AutoTokenizer call sites in the updater to pass trust_remote_code=True for it, matching how t5 is special-cased. * vocab : move Carbon dispatch to _set_vocab_carbon + LlamaModel branch Refactor the conversion-side changes to follow the per-tokenizer-family convention used by _set_vocab_qwen, _set_vocab_interns1, _set_vocab_glm, etc. instead of conditionalising the shared get_vocab_base / get_vocab_base_pre paths. * conversion/base.py: add _set_vocab_carbon — self-contained, loads with trust_remote_code=True so HybridDNATokenizer's merged Qwen3 + DNA vocab is visible, writes tokenizer.ggml.pre = "carbon" directly. * conversion/llama.py: branch in LlamaModel.set_vocab on tokenizer_config.json["tokenizer_class"] == "HybridDNATokenizer" and dispatch to _set_vocab_carbon. Same precedent as conversion/bert.py (tokenizer_class branch between BertTokenizer / RobertaTokenizer) and conversion/phi.py. * conversion/base.py: revert the conditional in get_vocab_base and the class-name short-circuit in the auto-generated get_vocab_base_pre. * tests : expand ggml-vocab-carbon.gguf fixtures with model-card examples Add 6 cases from the Carbon-3B model card on top of the existing edge coverage: the unterminated basic-completion prompt, the closed 33-bp example, the metadata-conditioned prompt (with <vertebrate_mammalian> and <protein_coding_region> which BPE-decompose since they are not in the vocab), the documented anti-pattern of raw DNA without <dna> tags, and the two likelihood-scoring examples. Brings the suite to 19 cases. * vocab : promote HybridDNATokenizer to its own LLAMA_VOCAB_TYPE Refactor per upstream review: > This should be its own tokenizer model, ie. carbonhybriddna instead > of gpt2 and not carbon pre-tokenizer. That way you can keep the > correct pre-tokenizer, in case that ever changes. Previously the tokenizer was modelled as LLAMA_VOCAB_TYPE_BPE plus a new LLAMA_VOCAB_PRE_TYPE_CARBON, which (a) put a CARBON-specific branch inside llm_tokenizer_bpe_session::tokenize (only existing pre-types differ in regex, not dispatch logic), and (b) conflated "hybrid DNA tokenization" with "Qwen3 BPE pre-tokenizer". This change moves it to its own vocab type, peer to PLAMO2, with the GGUF model name matching the HF tokenizer class (HybridDNATokenizer): * include/llama.h: new LLAMA_VOCAB_TYPE_HYBRIDDNA = 7. * src/llama-vocab.cpp: new llm_tokenizer_hybriddna + session that owns std::unique_ptr<llm_tokenizer_bpe> for non-<dna> text and routes raw text through a DNA-aware splitter; wired into init_tokenizer, tokenize, type_name, byte_to_token, and the BPE-style token_to_piece case (DNA k-mers + <dna>/</dna>/<oov> are pure ASCII, so byte-level BPE decoding handles them). LLAMA_VOCAB_TYPE_HYBRIDDNA gets its own branch in the vocab-type config block alongside SPM/WPM/UGM/RWKV, where pre_type is set to QWEN2 and the matching add_space_prefix / escape_whitespaces / clean_spaces flags are applied — mirroring qwen2's BPE path so byte-level BPE merging stays bit-identical to the Python reference for non-DNA text. * src/llama-vocab.h: drop the short-lived LLAMA_VOCAB_PRE_TYPE_CARBON. * conversion/base.py: _set_vocab_hybriddna writes tokenizer.ggml.model = "hybriddna" (no separate pre). * conversion/llama.py: dispatch on tokenizer_class == "HybridDNATokenizer" same as bert.py / phi.py do. * models/ggml-vocab-hybriddna.gguf{,.inp,.out}: renamed fixture + regenerated metadata. * convert_hf_to_gguf_update.py: drop the stale chkhsh entry and trust_remote_code special-case (no longer needed since dispatch is now class-name driven, not chkhsh). Verified end-to-end against HuggingFaceBio/Carbon-{500M,3B,8B}: tokenization is bit-identical to the Python HybridDNATokenizer for all 19 test fixtures plus the model-card metadata-conditioned prompt; greedy completion produces the same DNA continuation as the Python reference; spec-dec with 500M as draft for 8B still works. * vocab : relax llm_tokenizer_bpe assert to allow HYBRIDDNA * vocab : drop llm_tokenizer_bpe vocab-type assert * vocab : write tokenizer.ggml.pre for HYBRIDDNA, share BPE dispatch * vocab : assert BPE or HYBRIDDNA in llm_tokenizer_bpe * vocab : annotate #endif with PRETOKENIZERDEBUG * vocab : drop local hybriddna fixture (moves to ggml-org/vocabs) * deduplicate * simplify * simplify --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required.

…23459) * app : add batched-bench, fit-params, quantize & perplexity Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add missing main.cpp Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add EOL Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>

…d binary (ggml-org#23442)

…#23461) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or draft model (model_dft). For MTP (Multi-Token Prediction) models, ctx_dft holds GPU-allocated resources (KV cache, compute buffers) that are not freed when entering the sleeping state. On each sleep/resume cycle, new resources are allocated without the old ones being freed, leading to a VRAM leak that eventually crashes the server with out-of-memory errors. Fix by explicitly resetting spec, ctx_dft, and model_dft in destroy() before resetting llama_init, ensuring proper cleanup order to avoid use-after-free. ref: ggml-org#23395 Assisted-by: llama.cpp:local pi

…3411) * metal : fix GGML_OP_SET kernel threads * tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operations (reshaping), where the total number of elements must match. - Renamed ne -> ne_src, added ne_dst parameter (default: use src shape) - Added 50 new reshaping test cases covering 1D<->2D<->3D<->4D conversions - Tests exercise 1024 boundary, small shapes, and large dimensionality changes - Fixed dangling reference bug (storing & to temporary std::array) - Updated all existing test calls with permute/transpose args for compatibility Assisted-by: llama.cpp:local pi * metal : optimize concat kernel with row batching for small widths When ne0 < 256, batch multiple rows into a single threadgroup to improve occupancy. This avoids underutilizing the GPU when processing narrow tensors. - Dispatch nth = min(256, ne0) threads per group - Calculate nrptg (rows per threadgroup) to fill up to 256 threads - Update kernel index calculation to handle the row batching - Add boundary check for i1 >= ne1 Assisted-by: llama.cpp:local pi * tests : clean-up * tests : refactor CPY shape tests to use dimension permutations Replace 75 hardcoded test cases with a loop over permutations of {3, 5, 7, 32} (total elements: 3360). Each src permutation is tested against canonical sorted and reverse dst, skipping identical shapes. Covers F32, F16, and Q4_0 (when both src and dst ne0 == 32). Assisted-by: llama.cpp:local pi

Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for clients to monitor prompt evaluation progress during processing.

* tests : move save-load-state from examples to tests - Move examples/save-load-state/ to tests/test-save-load-state.cpp - Remove subdirectory reference from examples/CMakeLists.txt - Add test to tests/CMakeLists.txt as a model test - Remove CODEOWNERS entry for removed example directory Assisted-by: llama.cpp:local pi * cont : update ci

* vulkan: fuse snake activation (mul, sin, sqr, mul, add) Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise kernel. test_snake_fuse from the CUDA PR now also compares CPU naive vs Vulkan fused across F32 / F16 / BF16. * vulkan: address jeffbolznv review for fused snake activation Rename T / C to ne0 / ne1 in the shader and push constants to match the standard naming convention used across the Vulkan backend. Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous (the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be tightly packed on the broadcast dim (the shader reads data_a[i1]). * vulkan: tighten snake fusion type checks for all operands (address jeffbolznv review) * vulkan: reject snake fusion when ne[2] or ne[3] > 1 (address jeffbolznv review) * vulkan: address 0cc4m review for fused snake activation snake.comp is renamed to follow the ggml DATA_A_* / A_TYPE convention. A_TYPE now applies to the activation tensor data_a instead of the broadcast multiplier, and the bindings become data_a (A_TYPE), data_b (float), data_c (float) and data_d (D_TYPE). A header at the top of the shader maps each buffer to its role in y = x + sin(b * x)^2 * c. On the C++ side, ggml_vk_can_fuse_snake reuses the existing snake_pattern constant instead of duplicating the op list, sin_node is extracted as a named local alongside the other chain nodes, and the broadcast operands a and inv_b are now required to be GGML_TYPE_F32 to match the hardcoded float bindings on data_b and data_c (the previous a->type == x->type would silently reject any future BF16 or F16 chain once the supports_op gate for SIN / SQR is lifted). ggml_vk_snake_dispatch_fused gets an explicit GGML_TYPE_F32 case and GGML_ABORT on default in place of the silent f32 fallback, and a stale comment about data_a[i1] / data_inv_b[i1] is refreshed to match the new binding names.

…default (ggml-org#23462) * cmake : remove STATIC from impl libraries, allow BUILD_SHARED_LIBS control Remove explicit STATIC from all -impl libraries (server, cli, completion, bench, batched-bench, fit-params, quantize, perplexity) so BUILD_SHARED_LIBS controls shared vs static linkage. Add WINDOWS_EXPORT_ALL_SYMBOLS ON for proper DLL export on Windows. Assisted-by: llama.cpp:local pi * cmake : enable LLAMA_BUILD_APP by default Assisted-by: llama.cpp:local pi * ci : disable app in build-cmake-pkg.yml

…#23511) * pi : update * ci : fix ios build * ci : fix andoroid * ci : fix apple builds * cmake : add install() for impl libraries Add install(TARGETS <target> LIBRARY) for all -impl libraries that were changed from STATIC to shared (controlled by BUILD_SHARED_LIBS) in commit bb28c1f. Without this, cmake --install fails to copy the shared libraries, causing runtime errors like: llama-server: error while loading shared libraries: libllama-server-impl.so Ref: ggml-org#23494 (comment) Assisted-by: llama.cpp:local pi * ci : fix xcframework build

* vocab : mark hybriddna k-mers to avoid BPE token collisions * improved loop --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* ggml-zendnn : add Q8_0 quantization support * ggml-zendnn : sync with latest ZenDNN * ggml-zendnn : address review comments for Q8_0

* model: support for Mellum architecture * model: improve mellum.py formatting * model: improve mellum.py formatting once again * deps: downgrade transformers to 4.57.6 (to fix CI) * deps: remove huggingface_hub dependency * deps: remove huggingface_hub from test requirements --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* hex-ops: fix profiler output (ie remove the redundant NONEs) * hex-prof: update profiling script to support tot.usec column

…l-org#24006)

…gml-org#23425)

* tests : add support for qwen3 SSM archs * arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS * cont : naming + TODOs

* cuda: reserve space for quantize kv-cache at startup * address review comments * remove forward decl Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * remove assert in ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

…gml-org#24030) * Removes __restrict__ from PDL kernel headers due to incompatibility with PDL. Adds preprocessor directives based on arch in kernel body to add __restrict__ to retain performance on older architectures. * Simplifies new __restrict__ usage via macro * Add hopper to PDL __restrict__ fix. Co-authored-by: Oliver Simons <osimons@nvidia.com> --------- Co-authored-by: Oliver Simons <osimons@nvidia.com>

* add model * nits

* qwen35: use post-norm hidden state for MTP * rename pre_norm to nextn * fix step35

* Tidy up SYCL doc a bit - Add explicit links to referenced items - Fix spelling errors Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> * Correct documented default for GGML_SYCL_GRAPH The default is ON, not OFF: $ cmake -LAH -B build | grep GGML_SYCL_GRAPH ... GGML_SYCL_GRAPH:BOOL=ON Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> * Move docker instructions from SYCL.md to docker.md This makes them directly accesible from the Quick Start section of the top-level README.md. Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> * Refer to intel.Dockerfile for ARGs and their defaults The defaults are always changing; this avoids accuracy errors from duplicating the information. Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> * Remove mention of Nvidia in SYCL row of backend table This support was removed in 2026.02 - refer to the SYCL.md News. Signed-off-by: Todd Malsbary <todd.malsbary@intel.com> --------- Signed-off-by: Todd Malsbary <todd.malsbary@intel.com>

…2754) * ggml-cpu: add rvv 512b,1024b impls for iq4_xs * ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants * ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, iq2_xxs improve iq2_xs impl for rvv 256 Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> --------- Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai> Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

…rt (ggml-org#23834) * Start work on flash_attn refactor * Refactor * Split k/v quantization * Refactor and abstract quantization logic for flash_attn and mul_mat * Add quantization support to tile path * formatting * Move to functions, add a check

…#24073) * tests : refactor test-save-load-state to accept token input - Default prompt is now empty; when not provided, generate n_batch random tokens (useful for models without a tokenizer) - Tokenization happens once upfront; pass token vector to test functions - generate_tokens prints token IDs instead of decoded pieces - Use llama_model_get_vocab / llama_vocab_n_tokens API - Upgrade log level from LOG_TRC to LOG_INF for visibility Assisted-by: llama.cpp:local pi * cont : use llama_tokens alias

) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

…abled (ggml-org#24053)

# Conflicts: # .github/workflows/ui.yml # .gitignore # convert_hf_to_gguf.py # ggml/src/ggml-cpu/arch-fallback.h # tests/test-backend-ops.cpp # tools/cli/README.md # tools/completion/README.md # tools/server/README-dev.md # tools/server/README.md # tools/server/public/bundle.css # tools/server/public/bundle.js # tools/server/public/index.html # tools/server/server-chat.cpp # tools/server/server-models.cpp # tools/ui/.gitignore # tools/ui/.npmrc # tools/ui/.prettierignore # tools/ui/.prettierrc # tools/ui/.storybook/decorators/ModeWatcherDecorator.svelte # tools/ui/.storybook/decorators/TooltipProviderDecorator.svelte # tools/ui/.storybook/main.ts # tools/ui/.storybook/preview.ts # tools/ui/.storybook/vitest.setup.ts # tools/ui/README.md # tools/ui/components.json # tools/ui/docs/architecture/high-level-architecture-simplified.md # tools/ui/docs/architecture/high-level-architecture.md # tools/ui/docs/flows/chat-flow.md # tools/ui/docs/flows/conversations-flow.md # tools/ui/docs/flows/data-flow-simplified-model-mode.md # tools/ui/docs/flows/data-flow-simplified-router-mode.md # tools/ui/docs/flows/database-flow.md # tools/ui/docs/flows/mcp-flow.md # tools/ui/docs/flows/models-flow.md # tools/ui/docs/flows/server-flow.md # tools/ui/docs/flows/settings-flow.md # tools/ui/eslint.config.js # tools/ui/package-lock.json # tools/ui/package.json # tools/ui/playwright.config.ts # tools/ui/scripts/dev.sh # tools/ui/scripts/vite-plugin-llama-cpp-build.ts # tools/ui/src/app.css # tools/ui/src/app.d.ts # tools/ui/src/app.html # tools/ui/src/lib/actions/fade-in-view.svelte.ts # tools/ui/src/lib/components/app/badges/BadgeInfo.svelte # tools/ui/src/lib/components/app/badges/index.ts # tools/ui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentsList/ChatAttachmentsListItem/ChatAttachmentsListItemMcpPrompt.svelte # tools/ui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentsList/ChatAttachmentsListItem/ChatAttachmentsListItemMcpResource.svelte # tools/ui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentsList/ChatAttachmentsListItem/ChatAttachmentsListItemThumbnailImage.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatForm.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormActions/ChatFormActionRecord.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormActions/ChatFormActionSubmit.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormFileInputInvisible.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormMcpResourcesList.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPicker/ChatFormPickerItemHeader.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPicker/ChatFormPickerList.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPicker/ChatFormPickerListItem.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPicker/ChatFormPickerListItemSkeleton.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPicker/ChatFormPickerPopover.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPickerMcpPrompts/ChatFormPickerMcpPrompts.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPickerMcpPrompts/ChatFormPromptPickerArgumentForm.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPickerMcpPrompts/ChatFormPromptPickerArgumentInput.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormPickers/ChatFormPickerMcpResources.svelte # tools/ui/src/lib/components/app/chat/ChatForm/ChatFormTextarea.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessage/ChatMessage.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessage/ChatMessageAssistant/ChatMessageAssistant.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessage/ChatMessageMcpPrompt/ChatMessageMcpPrompt.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessage/ChatMessageMcpPrompt/ChatMessageMcpPromptContent.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessage/ChatMessageSystem/ChatMessageSystem.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessageActions/ChatMessageActionIcons/ChatMessageActionIcons.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessageAgenticContent.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessageEditForm.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessageStatistics/ChatMessageStatistics.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessageStatistics/ChatMessageStatisticsBadge.svelte # tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessages.svelte # tools/ui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte # tools/ui/src/lib/components/app/chat/ChatScreen/ChatScreenDragOverlay.svelte # tools/ui/src/lib/components/app/chat/ChatScreen/ChatScreenForm.svelte # tools/ui/src/lib/components/app/chat/ChatScreen/ChatScreenProcessingInfo.svelte # tools/ui/src/lib/components/app/chat/index.ts # tools/ui/src/lib/components/app/content/CollapsibleContentBlock.svelte # tools/ui/src/lib/components/app/content/MarkdownContent/MarkdownContent.svelte # tools/ui/src/lib/components/app/content/MarkdownContent/plugins/rehype/enhance-links.ts # tools/ui/src/lib/components/app/content/MarkdownContent/plugins/rehype/rehype-rtl-support.ts # tools/ui/src/lib/components/app/content/MarkdownContent/plugins/rehype/resolve-attachment-images.ts # tools/ui/src/lib/components/app/content/MarkdownContent/plugins/rehype/table-html-restorer.ts # tools/ui/src/lib/components/app/content/MarkdownContent/plugins/remark/literal-html.ts # tools/ui/src/lib/components/app/content/SyntaxHighlightedCode.svelte # tools/ui/src/lib/components/app/content/index.ts # tools/ui/src/lib/components/app/dialogs/DialogChatError.svelte # tools/ui/src/lib/components/app/dialogs/DialogCodePreview.svelte # tools/ui/src/lib/components/app/dialogs/DialogConfirmation.svelte # tools/ui/src/lib/components/app/dialogs/DialogConversationSelection.svelte # tools/ui/src/lib/components/app/dialogs/DialogConversationTitleUpdate.svelte # tools/ui/src/lib/components/app/dialogs/DialogEmptyFileAlert.svelte # tools/ui/src/lib/components/app/dialogs/DialogMcpResourcePreview.svelte # tools/ui/src/lib/components/app/dialogs/DialogMcpResourcesBrowser.svelte # tools/ui/src/lib/components/app/dialogs/DialogModelInformation.svelte # tools/ui/src/lib/components/app/dialogs/DialogModelNotAvailable.svelte # tools/ui/src/lib/components/app/dialogs/index.ts # tools/ui/src/lib/components/app/forms/InputWithSuggestions.svelte # tools/ui/src/lib/components/app/forms/KeyValuePairs.svelte # tools/ui/src/lib/components/app/forms/SearchInput.svelte # tools/ui/src/lib/components/app/forms/index.ts # tools/ui/src/lib/components/app/index.ts # tools/ui/src/lib/components/app/mcp/McpCapabilitiesBadges.svelte # tools/ui/src/lib/components/app/mcp/McpConnectionLogs.svelte # tools/ui/src/lib/components/app/mcp/McpLogo.svelte # tools/ui/src/lib/components/app/mcp/McpResourcePreview.svelte # tools/ui/src/lib/components/app/mcp/McpResourceTemplateForm.svelte # tools/ui/src/lib/components/app/mcp/McpResourcesBrowser/McpResourcesBrowser.svelte # tools/ui/src/lib/components/app/mcp/McpResourcesBrowser/McpResourcesBrowserEmptyState.svelte # tools/ui/src/lib/components/app/mcp/McpResourcesBrowser/McpResourcesBrowserHeader.svelte # tools/ui/src/lib/components/app/mcp/McpResourcesBrowser/McpResourcesBrowserServerItem.svelte # tools/ui/src/lib/components/app/mcp/McpResourcesBrowser/mcp-resources-browser.ts # tools/ui/src/lib/components/app/mcp/McpServerCard/McpServerCard.svelte # tools/ui/src/lib/components/app/mcp/McpServerCard/McpServerCardActions.svelte # tools/ui/src/lib/components/app/mcp/McpServerCard/McpServerCardDeleteDialog.svelte # tools/ui/src/lib/components/app/mcp/McpServerCard/McpServerCardEditForm.svelte # tools/ui/src/lib/components/app/mcp/McpServerCard/McpServerCardHeader.svelte # tools/ui/src/lib/components/app/mcp/McpServerCard/McpServerCardToolsList.svelte # tools/ui/src/lib/components/app/mcp/McpServerCardSkeleton.svelte # tools/ui/src/lib/components/app/mcp/McpServerForm.svelte # tools/ui/src/lib/components/app/mcp/McpServerInfo.svelte # tools/ui/src/lib/components/app/mcp/index.ts # tools/ui/src/lib/components/app/misc/ConversationSelection.svelte # tools/ui/src/lib/components/app/misc/HorizontalScrollCarousel.svelte # tools/ui/src/lib/components/app/misc/KeyboardShortcutInfo.svelte # tools/ui/src/lib/components/app/misc/TruncatedText.svelte # tools/ui/src/lib/components/app/misc/index.ts # tools/ui/src/lib/components/app/models/ModelBadge.svelte # tools/ui/src/lib/components/app/models/ModelId.svelte # tools/ui/src/lib/components/app/models/ModelsSelectorList.svelte # tools/ui/src/lib/components/app/models/ModelsSelectorOption.svelte # tools/ui/src/lib/components/app/models/index.ts # tools/ui/src/lib/components/app/models/utils.ts # tools/ui/src/lib/components/app/navigation/DropdownMenuActions.svelte # tools/ui/src/lib/components/app/navigation/DropdownMenuSearchable.svelte # tools/ui/src/lib/components/app/navigation/SidebarNavigation/SidebarNavigation.svelte # tools/ui/src/lib/components/app/navigation/SidebarNavigation/SidebarNavigationConversationItem.svelte # tools/ui/src/lib/components/app/navigation/SidebarNavigation/SidebarNavigationSearch.svelte # tools/ui/src/lib/components/app/server/ServerErrorSplash.svelte # tools/ui/src/lib/components/app/server/ServerLoadingSplash.svelte # tools/ui/src/lib/components/app/server/ServerStatus.svelte # tools/ui/src/lib/components/app/server/index.ts # tools/ui/src/lib/components/app/settings/SettingsChat/SettingsChatFields.svelte # tools/ui/src/lib/components/app/settings/SettingsChat/SettingsChatImportExportTab.svelte # tools/ui/src/lib/components/app/settings/SettingsChat/SettingsChatParameterSourceIndicator.svelte # tools/ui/src/lib/components/app/settings/SettingsFooter.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-action.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-cancel.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-content.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-description.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-footer.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-header.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-overlay.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-title.svelte # tools/ui/src/lib/components/ui/alert-dialog/alert-dialog-trigger.svelte # tools/ui/src/lib/components/ui/alert-dialog/index.ts # tools/ui/src/lib/components/ui/alert/alert-description.svelte # tools/ui/src/lib/components/ui/alert/alert-title.svelte # tools/ui/src/lib/components/ui/alert/alert.svelte # tools/ui/src/lib/components/ui/alert/index.ts # tools/ui/src/lib/components/ui/badge/badge.svelte # tools/ui/src/lib/components/ui/badge/index.ts # tools/ui/src/lib/components/ui/button/button.svelte # tools/ui/src/lib/components/ui/button/index.ts # tools/ui/src/lib/components/ui/card/card-action.svelte # tools/ui/src/lib/components/ui/card/card-content.svelte # tools/ui/src/lib/components/ui/card/card-description.svelte # tools/ui/src/lib/components/ui/card/card-footer.svelte # tools/ui/src/lib/components/ui/card/card-header.svelte # tools/ui/src/lib/components/ui/card/card-title.svelte # tools/ui/src/lib/components/ui/card/card.svelte # tools/ui/src/lib/components/ui/card/index.ts # tools/ui/src/lib/components/ui/checkbox/checkbox.svelte # tools/ui/src/lib/components/ui/checkbox/index.ts # tools/ui/src/lib/components/ui/collapsible/collapsible-content.svelte # tools/ui/src/lib/components/ui/collapsible/collapsible-trigger.svelte # tools/ui/src/lib/components/ui/collapsible/collapsible.svelte # tools/ui/src/lib/components/ui/collapsible/index.ts # tools/ui/src/lib/components/ui/dialog/dialog-close.svelte # tools/ui/src/lib/components/ui/dialog/dialog-content.svelte # tools/ui/src/lib/components/ui/dialog/dialog-description.svelte # tools/ui/src/lib/components/ui/dialog/dialog-footer.svelte # tools/ui/src/lib/components/ui/dialog/dialog-header.svelte # tools/ui/src/lib/components/ui/dialog/dialog-overlay.svelte # tools/ui/src/lib/components/ui/dialog/dialog-title.svelte # tools/ui/src/lib/components/ui/dialog/dialog-trigger.svelte # tools/ui/src/lib/components/ui/dialog/index.ts # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-checkbox-item.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-content.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-group-heading.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-group.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-item.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-label.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-radio-group.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-radio-item.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-separator.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-shortcut.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-sub-trigger.svelte # tools/ui/src/lib/components/ui/dropdown-menu/dropdown-menu-trigger.svelte # tools/ui/src/lib/components/ui/dropdown-menu/index.ts # tools/ui/src/lib/components/ui/input/index.ts # tools/ui/src/lib/components/ui/input/input.svelte # tools/ui/src/lib/components/ui/label/index.ts # tools/ui/src/lib/components/ui/label/label.svelte # tools/ui/src/lib/components/ui/popover/index.ts # tools/ui/src/lib/components/ui/popover/popover-close.svelte # tools/ui/src/lib/components/ui/popover/popover-content.svelte # tools/ui/src/lib/components/ui/popover/popover-portal.svelte # tools/ui/src/lib/components/ui/popover/popover-trigger.svelte # tools/ui/src/lib/components/ui/popover/popover.svelte # tools/ui/src/lib/components/ui/scroll-area/index.ts # tools/ui/src/lib/components/ui/scroll-area/scroll-area-scrollbar.svelte # tools/ui/src/lib/components/ui/scroll-area/scroll-area.svelte # tools/ui/src/lib/components/ui/select/index.ts # tools/ui/src/lib/components/ui/select/select-content.svelte # tools/ui/src/lib/components/ui/select/select-group-heading.svelte # tools/ui/src/lib/components/ui/select/select-group.svelte # tools/ui/src/lib/components/ui/select/select-item.svelte # tools/ui/src/lib/components/ui/select/select-label.svelte # tools/ui/src/lib/components/ui/select/select-scroll-down-button.svelte # tools/ui/src/lib/components/ui/select/select-scroll-up-button.svelte # tools/ui/src/lib/components/ui/select/select-separator.svelte # tools/ui/src/lib/components/ui/select/select-trigger.svelte # tools/ui/src/lib/components/ui/separator/index.ts # tools/ui/src/lib/components/ui/separator/separator.svelte # tools/ui/src/lib/components/ui/sheet/index.ts # tools/ui/src/lib/components/ui/sheet/sheet-close.svelte # tools/ui/src/lib/components/ui/sheet/sheet-content.svelte # tools/ui/src/lib/components/ui/sheet/sheet-description.svelte # tools/ui/src/lib/components/ui/sheet/sheet-footer.svelte # tools/ui/src/lib/components/ui/sheet/sheet-header.svelte # tools/ui/src/lib/components/ui/sheet/sheet-overlay.svelte # tools/ui/src/lib/components/ui/sheet/sheet-title.svelte # tools/ui/src/lib/components/ui/sheet/sheet-trigger.svelte # tools/ui/src/lib/components/ui/sidebar/constants.ts # tools/ui/src/lib/components/ui/sidebar/context.svelte.ts # tools/ui/src/lib/components/ui/sidebar/index.ts # tools/ui/src/lib/components/ui/sidebar/sidebar-content.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-footer.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-group-action.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-group-content.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-group-label.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-group.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-header.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-input.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-inset.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-action.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-badge.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-button.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-item.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-skeleton.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-sub-button.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-sub-item.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu-sub.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-menu.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-provider.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-rail.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-separator.svelte # tools/ui/src/lib/components/ui/sidebar/sidebar-trigger.svelte # tools/ui/src/lib/components/ui/skeleton/index.ts # tools/ui/src/lib/components/ui/skeleton/skeleton.svelte # tools/ui/src/lib/components/ui/switch/index.ts # tools/ui/src/lib/components/ui/switch/switch.svelte # tools/ui/src/lib/components/ui/table/index.ts # tools/ui/src/lib/components/ui/table/table-body.svelte # tools/ui/src/lib/components/ui/table/table-caption.svelte # tools/ui/src/lib/components/ui/table/table-cell.svelte # tools/ui/src/lib/components/ui/table/table-footer.svelte # tools/ui/src/lib/components/ui/table/table-head.svelte # tools/ui/src/lib/components/ui/table/table-header.svelte # tools/ui/src/lib/components/ui/table/table-row.svelte # tools/ui/src/lib/components/ui/table/table.svelte # tools/ui/src/lib/components/ui/textarea/index.ts # tools/ui/src/lib/components/ui/textarea/textarea.svelte # tools/ui/src/lib/components/ui/tooltip/index.ts # tools/ui/src/lib/components/ui/tooltip/tooltip-content.svelte # tools/ui/src/lib/components/ui/tooltip/tooltip-trigger.svelte # tools/ui/src/lib/components/ui/utils.ts # tools/ui/src/lib/constants/agentic.ts # tools/ui/src/lib/constants/attachment-labels.ts # tools/ui/src/lib/constants/auto-scroll.ts # tools/ui/src/lib/constants/binary-detection.ts # tools/ui/src/lib/constants/cache.ts # tools/ui/src/lib/constants/chat-form.ts # tools/ui/src/lib/constants/code-blocks.ts # tools/ui/src/lib/constants/code.ts # tools/ui/src/lib/constants/css-classes.ts # tools/ui/src/lib/constants/floating-ui-constraints.ts # tools/ui/src/lib/constants/formatters.ts # tools/ui/src/lib/constants/index.ts # tools/ui/src/lib/constants/key-value-pairs.ts # tools/ui/src/lib/constants/latex-protection.ts # tools/ui/src/lib/constants/literal-html.ts # tools/ui/src/lib/constants/markdown.ts # tools/ui/src/lib/constants/max-bundle-size.ts # tools/ui/src/lib/constants/mcp-form.ts # tools/ui/src/lib/constants/mcp-resource.ts # tools/ui/src/lib/constants/mcp.ts # tools/ui/src/lib/constants/message-export.ts # tools/ui/src/lib/constants/model-id.ts # tools/ui/src/lib/constants/precision.ts # tools/ui/src/lib/constants/processing-info.ts # tools/ui/src/lib/constants/settings-keys.ts # tools/ui/src/lib/constants/supported-file-types.ts # tools/ui/src/lib/constants/table-html-restorer.ts # tools/ui/src/lib/constants/tooltip-config.ts # tools/ui/src/lib/constants/uri-template.ts # tools/ui/src/lib/constants/viewport.ts # tools/ui/src/lib/contexts/chat-actions.context.ts # tools/ui/src/lib/contexts/index.ts # tools/ui/src/lib/contexts/message-edit.context.ts # tools/ui/src/lib/enums/agentic.enums.ts # tools/ui/src/lib/enums/chat.enums.ts # tools/ui/src/lib/enums/files.enums.ts # tools/ui/src/lib/enums/keyboard.enums.ts # tools/ui/src/lib/enums/mcp.enums.ts # tools/ui/src/lib/enums/model.enums.ts # tools/ui/src/lib/enums/server.enums.ts # tools/ui/src/lib/enums/settings.enums.ts # tools/ui/src/lib/enums/ui.enums.ts # tools/ui/src/lib/hooks/use-auto-scroll.svelte.ts # tools/ui/src/lib/hooks/use-processing-state.svelte.ts # tools/ui/src/lib/services/chat.service.ts # tools/ui/src/lib/services/database.service.ts # tools/ui/src/lib/services/index.ts # tools/ui/src/lib/services/mcp.service.ts # tools/ui/src/lib/services/models.service.ts # tools/ui/src/lib/services/parameter-sync.service.spec.ts # tools/ui/src/lib/services/props.service.ts # tools/ui/src/lib/stores/agentic.svelte.ts # tools/ui/src/lib/stores/chat.svelte.ts # tools/ui/src/lib/stores/conversations.svelte.ts # tools/ui/src/lib/stores/mcp-resources.svelte.ts # tools/ui/src/lib/stores/mcp.svelte.ts # tools/ui/src/lib/stores/models.svelte.ts # tools/ui/src/lib/stores/persisted.svelte.ts # tools/ui/src/lib/stores/server.svelte.ts # tools/ui/src/lib/stores/settings.svelte.ts # tools/ui/src/lib/types/agentic.d.ts # tools/ui/src/lib/types/api.d.ts # tools/ui/src/lib/types/chat.d.ts # tools/ui/src/lib/types/common.d.ts # tools/ui/src/lib/types/database.d.ts # tools/ui/src/lib/types/index.ts # tools/ui/src/lib/types/mcp.d.ts # tools/ui/src/lib/types/models.d.ts # tools/ui/src/lib/types/settings.d.ts # tools/ui/src/lib/utils/abort.ts # tools/ui/src/lib/utils/agentic.ts # tools/ui/src/lib/utils/api-fetch.ts # tools/ui/src/lib/utils/api-headers.ts # tools/ui/src/lib/utils/api-key-validation.ts # tools/ui/src/lib/utils/attachment-display.ts # tools/ui/src/lib/utils/attachment-type.ts # tools/ui/src/lib/utils/audio-recording.ts # tools/ui/src/lib/utils/autoresize-textarea.ts # tools/ui/src/lib/utils/branching.ts # tools/ui/src/lib/utils/browser-only.ts # tools/ui/src/lib/utils/cache-ttl.ts # tools/ui/src/lib/utils/clipboard.ts # tools/ui/src/lib/utils/code.ts # tools/ui/src/lib/utils/config-helpers.ts # tools/ui/src/lib/utils/conversation-utils.ts # tools/ui/src/lib/utils/convert-files-to-extra.ts # tools/ui/src/lib/utils/cors-proxy.ts # tools/ui/src/lib/utils/data-url.ts # tools/ui/src/lib/utils/debounce.ts # tools/ui/src/lib/utils/file-preview.ts # tools/ui/src/lib/utils/file-type.ts # tools/ui/src/lib/utils/formatters.ts # tools/ui/src/lib/utils/headers.ts # tools/ui/src/lib/utils/image-error-fallback.ts # tools/ui/src/lib/utils/index.ts # tools/ui/src/lib/utils/is-ime-composing.ts # tools/ui/src/lib/utils/latex-protection.ts # tools/ui/src/lib/utils/legacy-migration.ts # tools/ui/src/lib/utils/mcp.ts # tools/ui/src/lib/utils/modality-file-validation.ts # tools/ui/src/lib/utils/model-names.ts # tools/ui/src/lib/utils/pdf-processing.ts # tools/ui/src/lib/utils/portal-to-body.ts # tools/ui/src/lib/utils/precision.ts # tools/ui/src/lib/utils/process-uploaded-files.ts # tools/ui/src/lib/utils/redact.ts # tools/ui/src/lib/utils/request-helpers.ts # tools/ui/src/lib/utils/sanitize.ts # tools/ui/src/lib/utils/svg-to-png.ts # tools/ui/src/lib/utils/syntax-highlight-language.ts # tools/ui/src/lib/utils/text-files.ts # tools/ui/src/lib/utils/text.ts # tools/ui/src/lib/utils/uri-template.ts # tools/ui/src/lib/utils/uuid.ts # tools/ui/src/lib/utils/webp-to-png.ts # tools/ui/src/routes/(chat)/+page.svelte # tools/ui/src/routes/(chat)/+page.ts # tools/ui/src/routes/(chat)/chat/[id]/+page.svelte # tools/ui/src/routes/(chat)/chat/[id]/+page.ts # tools/ui/src/routes/+error.svelte # tools/ui/src/routes/+layout.svelte # tools/ui/src/styles/katex-custom.scss # tools/ui/static/favicon.svg # tools/ui/static/tts-default-ref.mp3 # tools/ui/svelte.config.js # tools/ui/tests/client/components/TestWrapper.svelte # tools/ui/tests/client/page.svelte.test.ts # tools/ui/tests/stories/ChatMessage.stories.svelte # tools/ui/tests/stories/ChatScreenForm.stories.svelte # tools/ui/tests/stories/Introduction.mdx # tools/ui/tests/stories/MarkdownContent.stories.svelte # tools/ui/tests/stories/SidebarNavigation.stories.svelte # tools/ui/tests/stories/fixtures/ai-tutorial.ts # tools/ui/tests/stories/fixtures/api-docs.ts # tools/ui/tests/stories/fixtures/assets/1.jpg # tools/ui/tests/stories/fixtures/assets/beautiful-flowers-lotus.webp # tools/ui/tests/stories/fixtures/assets/example.pdf # tools/ui/tests/stories/fixtures/assets/hf-logo.svg # tools/ui/tests/stories/fixtures/blog-post.ts # tools/ui/tests/stories/fixtures/data-analysis.ts # tools/ui/tests/stories/fixtures/empty.ts # tools/ui/tests/stories/fixtures/math-formulas.ts # tools/ui/tests/stories/fixtures/readme.ts # tools/ui/tests/stories/fixtures/storybook-mocks.ts # tools/ui/tests/unit/agentic-sections.test.ts # tools/ui/tests/unit/agentic-strip.test.ts # tools/ui/tests/unit/clipboard.test.ts # tools/ui/tests/unit/latex-protection.test.ts # tools/ui/tests/unit/mcp-service.test.ts # tools/ui/tests/unit/model-id-parser.test.ts # tools/ui/tests/unit/model-names.test.ts # tools/ui/tests/unit/reasoning-context.test.ts # tools/ui/tests/unit/redact.test.ts # tools/ui/tests/unit/request-helpers.test.ts # tools/ui/tests/unit/sanitize-headers.test.ts # tools/ui/tests/unit/uri-template.test.ts # tools/ui/tsconfig.json # tools/ui/vite.config.ts

daniandtheweb and others added 30 commits May 20, 2026 17:15

vulkan: optimize operations in the IM2COL shader (ggml-org#22685)

acd604f

* vulkan: optimize operations in the IM2COL shader * Add comments and improve the code formatting

common/speculative : fix nullptr crash in get_devices_str (ggml-org#2…

510b5c2

…3386) ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi

app : show version (ggml-org#23426)

ce02093

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

ggml : Check the right iface method before using the fallback 2d get (g…

2754ce1

…gml-org#23306) Probably no backends implement only one of 2d get/set, but this might be annoying for some future backend developer trying to add 2d get/set.

ui: Improve Git Hooks for UI development (ggml-org#23403)

5e932a1

* refactor: Improve Git Hooks for UI development * fix: Address review comments * fix: Use absolute git path for `/hooks` Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>

doc: fix spec mtp typo (ggml-org#23435)

2fc8d18

mtp: use inp_out_ids for skipping logit computation (ggml-org#23433)

12e5d99

when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required.

server: re-inject subcommand when router spawns children under unifie…

c902171

…d binary (ggml-org#23442)

fix(flash-attn): replace f32 with kv_type and q_type (ggml-org#23372)

5306f4b

Update WebGPU support and add link to blog/demo (ggml-org#23483)

ee7c305

CUDA: fix PDL CC check for JIT compilation (ggml-org#23471)

4f0e43d

vocab : fix HybridDNA tokenizer (ggml-org#23466)

afcda09

* vocab : mark hybriddna k-mers to avoid BPE token collisions * improved loop --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

cmake : build router app only during standalone builds (ggml-org#23521)

9c92e96

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

ggml-zendnn : add Q8_0 quantization support (ggml-org#23414)

99d4026

* ggml-zendnn : add Q8_0 quantization support * ggml-zendnn : sync with latest ZenDNN * ggml-zendnn : address review comments for Q8_0

Xarbirus and others added 25 commits June 2, 2026 22:11

hexagon: profiler output fix and script updates (ggml-org#24042)

5c394fd

* hex-ops: fix profiler output (ie remove the redundant NONEs) * hex-prof: update profiling script to support tot.usec column

opencl: use flat variants of q4_K and q6_K gemv for very large M (ggm…

63e66fd

…l-org#24006)

arg : removed unecesary mmproj download when users pass --no-mmproj (g…

e366626

…gml-org#23425)

ci : disable ccache for msvc windows release jobs (ggml-org#23911)

4da6370

update BoringSSL to 0.20260526.0 (ggml-org#23794)

d545a2a

tests : add support for qwen3 SSM archs (ggml-org#24031)

06938ac

* tests : add support for qwen3 SSM archs * arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS * cont : naming + TODOs

ggml-cpu: use runtime SVE width in FWHT (ggml-org#24059)

3571fa5

ui: Mermaid Diagrams in chat + interactive preview (ggml-org#24032)

ee4cf70

mtmd, model: allow skip build_vit() (ggml-org#24077)

a731805

* add model * nits

mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082)

c8d6a00

qwen35: use post-norm hidden state for MTP (ggml-org#24025)

166fe29

* qwen35: use post-norm hidden state for MTP * rename pre_norm to nextn * fix step35

mtmd: fix Gemma 4 unified FPE (ggml-org#24088)

94a220c

metal : reduce rset heartbeat from 500ms -> 5ms (ggml-org#24074)

3d19986

readme : add status badges (ggml-org#24104)

6ddc943

fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091

e3ba22d

) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

cmake: skip cvector-generator and export-lora when CPU backend is dis…

7ac5a42

…abled (ggml-org#24053)

server : add header to tools/server/server-http.h (ggml-org#24089)

0066404

marksverdhei merged commit 40ac62d into ht Jun 4, 2026
9 of 26 checks passed

marksverdhei deleted the sync/master-2026-06-04 branch June 4, 2026 14:16

This was referenced Jun 4, 2026

feat(dflash): integrate DFlash block-diffusion speculative decoder (rebased on post-rewrite ht) #62

Merged

fix(tests): TBQ block-size + tolerances after 128-block migration #63

Merged

Rebase conflict: ht on master (2026-05-18) #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync(master): absorb 544 upstream commits — per-arch refactor + Gemma4 12B#59

sync(master): absorb 544 upstream commits — per-arch refactor + Gemma4 12B#59
marksverdhei merged 548 commits into
htfrom
sync/master-2026-06-04

marksverdhei commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants