Add install() for impl libraries and fix Apple/Android builds#1
Merged
Conversation
* pi : update * ci : fix ios build * ci : fix andoroid * ci : fix apple builds * cmake : add install() for impl libraries Add install(TARGETS <target> LIBRARY) for all -impl libraries that were changed from STATIC to shared (controlled by BUILD_SHARED_LIBS) in commit bb28c1f. Without this, cmake --install fails to copy the shared libraries, causing runtime errors like: llama-server: error while loading shared libraries: libllama-server-impl.so Ref: #23494 (comment) Assisted-by: llama.cpp:local pi * ci : fix xcframework build
* vocab : mark hybriddna k-mers to avoid BPE token collisions * improved loop --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
* ggml-zendnn : add Q8_0 quantization support * ggml-zendnn : sync with latest ZenDNN * ggml-zendnn : address review comments for Q8_0
) * SYCL: add BF16 to DMMV kernel path for ~4x token generation speedup BF16 models had no dedicated token generation kernel — they fell through to the generic full-GEMM path, resulting in ~14% memory bandwidth utilization on Intel Arc GPUs. This adds BF16 support to the DMMV (dequantize mul-mat-vec) path, matching the existing F16 implementation. Fixes #20478 * SYCL: fix BF16 DMMV out-of-bounds when ncols % 64 != 0 The qk=1 kernel (used for F16 and BF16) iterates with stride 2*GGML_SYCL_DMMV_X (= 64 on Intel targets where WARP_SIZE=16). When ncols is a multiple of DMMV_X (32) but not of 2*DMMV_X (64), the last warp iteration accesses elements at col >= ncols, producing NaN for the final row and wrong values for interior rows. Fix: tighten can_use_dequantize_mul_mat_vec to require ne[0] % (2*DMMV_X) == 0 for F16/BF16 types, and update the ASSERT in the BF16 launcher to match. Quantized types use block-structured kernels with different access patterns and keep the existing DMMV_X check. Verified: test-backend-ops MUL_MAT passes 913/913 on Intel Arc Pro B70. Previously failing: m=128/129 n=1 k=1056 cases (NaN and ERR > 0.0005). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* sycl_gated_delta_net K>1 * editor_config
* [SYCL] Centralize Level Zero detection in ggml_sycl_init * use the same wording * get back the warning
- change `k_copy_src1_to_contiguous` so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends - switch the `O(n_as * n_routed_rows)` contraption to a counting sort-based procedure with `O(n_as + n_routed_rows)` complexity
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cmake : add install() for impl libraries + fix apple builds (ggml-org#23511
@ggerganov
ggerganov authored 5 hours ago
vocab : fix HybridDNA tokenizer (https://github.com/ggml-org/llama.cpp/pull/23466[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/afcda09d154a285cd366135f98ffc1d357f7ddbd)
@kashif
@CISC
kashif and CISC authored 4 hours ago
cmake : build router app only during standalone builds (https://github.com/ggml-org/llama.cpp/pull/23521[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/9c92e96a64fe0f03f5f3e5ab720a151941da1de5)
@fairydreaming
@sszymczy
fairydreaming and sszymczy authored 4 hours ago
ggml-zendnn : add Q8_0 quantization support (https://github.com/ggml-org/llama.cpp/pull/23414[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/99d4026b116605ed8e1f3ab179b3c63bc4637195)
@z-sachin
z-sachin authored 2 hours ago
docs: Update documentation with Granite 4.0/4.1 (https://github.com/ggml-org/llama.cpp/pull/23404[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/95feeab52e41ceaf71e87b2dd01895f6d8815b60)
@jesus-talavera-ibm
jesus-talavera-ibm authored 1 hour ago
SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (ggml-org#21580
@PMZFX
@claude
PMZFX and claude authored 1 hour ago
SYCL : gated_delta_net K>1 (https://github.com/ggml-org/llama.cpp/pull/23174[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/56f16f235c4a6ffd0cd316e1d4b5dcfbf2dcb7a4)
@karavayev
karavayev authored 1 hour ago
sycl : Level Zero detection in ggml_sycl_init (https://github.com/ggml-org/llama.cpp/pull/23097[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/bcfd1989e9a90af74669d94057ff2468682c3f4a)
@sanmai
sanmai authored 1 hour ago
SYCL: improve MoE prefill throughput (https://github.com/ggml-org/llama.cpp/pull/23142[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/cc9e331213b6a9cb186aabe01a4ec6a61419dd80)
@sanmai
sanmai authored 1 hour ago
perplexity : fix integer overflow (https://github.com/ggml-org/llama.cpp/pull/23496[)](https://github.com/THEman6989/llama.cpp-gfx906-turbo-mtp/commit/ef570f63087b6a5a2930210a13f87990e8113927)
@fairydreaming
@sszymczy
fairydreaming and sszymczy authored 1 hour ago