cmake : fix LLAMA_BUILD_UI logic#23190
Conversation
|
Hmm, some checks are failing... let's re-run them and make sure that all is working before merging |
|
I've been seeing some of these tests fail on other PRs as well. I'll wait for all of them to complete. |
|
Should the CMAKE flag in package.nix also be updated? |
|
Looks like it's just the WebGPU failing now, which seems to be currently broken. Merging this in. |
Cherry-picked from upstream ggml-org/llama.cpp@87589042c (merged 2026-05-17). option(LLAMA_BUILD_WEBUI ... ON) always leaves the deprecated flag DEFINED, so the compat-block guard `AND NOT DEFINED LLAMA_BUILD_UI` never fires. tools/ui/CMakeLists.txt then ORs both flags, so passing only the new `-DLLAMA_BUILD_UI=OFF` was silently ignored. Removes the deprecated options and simplifies the compat block + UI gate to a single flag. Fixes the nix-sandbox build failure reported by @arch-fan and @pacak on PR #146 — both hit the resulting xxd.cmake crash when an empty tools/ui/dist/index.html was produced by failed npm + HF Bucket provisioning. After this cherry-pick, `-DLLAMA_BUILD_UI=OFF` alone works as documented. Co-Authored-By: TheTom <tturney@psyguard.ai> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).
1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
(not zero-sentinel) in get_config_rdna
----------------------------------------------------------------
Reverts the round-1 conflict choice. Round 1 took upstream's new
sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
RDNA fallback. Template instances like
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
on the returned config (np = nwarps * cols_per_warp / ncols, etc).
nwarps=0 from the sentinel propagates to np=0, triggering compile-
time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
quality build is -Werror,-Wdivision-by-zero so it errors out.
TQ-tip behavior (delegate to ampere) returns a valid config —
restore it. Keeps all (640, 512) RDNA entries unioned in round 1.
2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
----------------------------------------------------------------
turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
missed. Mirrors vendors/hip.h line 142.
3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
----------------------------------------------------------------
The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
so the file-local copies look unused on non-CUDA builds. Annotate
with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
locals in the state-save k/v writer loops where `il` was unreferenced —
dead-code from a removed logging pass.
4. scripts/xxd.cmake — defensive quote of ${hex_data}
----------------------------------------------------------------
Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
makes the underlying xxd.cmake robust: when an empty UI source file
slips through, produce a 0-length .hpp instead of crashing with
cmake's cryptic "string sub-command LENGTH requires two arguments"
error. Worth proposing upstream as a follow-up.
Local Metal build green on M5 Max with all four fixes applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
arch-fan's next nix-sandbox build (after PR ggml-org#23190 cherry-pick + earlier empty-input defensive quote) hit a different xxd.cmake failure: scripts/xxd.cmake:10 (file): file failed to open for reading (No such file or directory): /build/source/build/tools/ui/dist/bundle.js Empty-file case (LENGTH error) was already handled by quoting the variable. This is the sibling case: file READ itself fails when the UI provisioning flow leaves an asset missing entirely (npm absent AND HF Bucket download blocked → some assets created empty, some not created at all). Fix: early-return with a valid 0-byte symbol when ${INPUT} doesn't exist. Also unify the empty-content path to emit {0} instead of {} (zero-element array initializer is C++ extension, not portable). Verified end-to-end on M5 Max by reproducing arch-fan's exact conditions: build/tools/ui/dist/ removed, PATH stripped of npm, LLAMA_USE_PREBUILT_UI=OFF. Without the fix, build crashes on bundle.js.hpp generation. With the fix, all four .hpp files generate as 0-byte symbols, llama-ui target completes cleanly, server builds with LLAMA_UI_DEFAULT_ENABLED=0 (no embedded UI but no crash) — exactly upstream's intended graceful degradation. No effect on normal builds with UI assets present (regenerated all 4 .hpp files at original 26MB / 2.5MB / 34KB / 1.4KB sizes, byte- identical to pre-fix output). Worth proposing upstream as defensive hardening for the xxd helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: tturney@psyguard.ai
Nix is actually broken because it does not allow network access during build. |
Gentoo is broken for the same reason. |
This reverts commit d4ee0dc. Upstream fixed this issue: ggml-org/llama.cpp#23190
you can download and copy WebUI files from an online machine: #23156 (comment) |
This reverts commit d4ee0dc. Upstream fixed this issue: ggml-org/llama.cpp#23190
This reverts commit d4ee0dc. Upstream fixed this issue: ggml-org/llama.cpp#23190
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).
1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
(not zero-sentinel) in get_config_rdna
----------------------------------------------------------------
Reverts the round-1 conflict choice. Round 1 took upstream's new
sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
RDNA fallback. Template instances like
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
on the returned config (np = nwarps * cols_per_warp / ncols, etc).
nwarps=0 from the sentinel propagates to np=0, triggering compile-
time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
quality build is -Werror,-Wdivision-by-zero so it errors out.
TQ-tip behavior (delegate to ampere) returns a valid config —
restore it. Keeps all (640, 512) RDNA entries unioned in round 1.
2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
----------------------------------------------------------------
turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
missed. Mirrors vendors/hip.h line 142.
3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
----------------------------------------------------------------
The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
so the file-local copies look unused on non-CUDA builds. Annotate
with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
locals in the state-save k/v writer loops where `il` was unreferenced —
dead-code from a removed logging pass.
4. scripts/xxd.cmake — defensive quote of ${hex_data}
----------------------------------------------------------------
Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
makes the underlying xxd.cmake robust: when an empty UI source file
slips through, produce a 0-length .hpp instead of crashing with
cmake's cryptic "string sub-command LENGTH requires two arguments"
error. Worth proposing upstream as a follow-up.
Local Metal build green on M5 Max with all four fixes applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
arch-fan's next nix-sandbox build (after PR ggml-org#23190 cherry-pick + earlier empty-input defensive quote) hit a different xxd.cmake failure: scripts/xxd.cmake:10 (file): file failed to open for reading (No such file or directory): /build/source/build/tools/ui/dist/bundle.js Empty-file case (LENGTH error) was already handled by quoting the variable. This is the sibling case: file READ itself fails when the UI provisioning flow leaves an asset missing entirely (npm absent AND HF Bucket download blocked → some assets created empty, some not created at all). Fix: early-return with a valid 0-byte symbol when ${INPUT} doesn't exist. Also unify the empty-content path to emit {0} instead of {} (zero-element array initializer is C++ extension, not portable). Verified end-to-end on M5 Max by reproducing arch-fan's exact conditions: build/tools/ui/dist/ removed, PATH stripped of npm, LLAMA_USE_PREBUILT_UI=OFF. Without the fix, build crashes on bundle.js.hpp generation. With the fix, all four .hpp files generate as 0-byte symbols, llama-ui target completes cleanly, server builds with LLAMA_UI_DEFAULT_ENABLED=0 (no embedded UI but no crash) — exactly upstream's intended graceful degradation. No effect on normal builds with UI assets present (regenerated all 4 .hpp files at original 26MB / 2.5MB / 34KB / 1.4KB sizes, byte- identical to pre-fix output). Worth proposing upstream as defensive hardening for the xxd helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: tturney@psyguard.ai
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).
1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
(not zero-sentinel) in get_config_rdna
----------------------------------------------------------------
Reverts the round-1 conflict choice. Round 1 took upstream's new
sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
RDNA fallback. Template instances like
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
on the returned config (np = nwarps * cols_per_warp / ncols, etc).
nwarps=0 from the sentinel propagates to np=0, triggering compile-
time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
quality build is -Werror,-Wdivision-by-zero so it errors out.
TQ-tip behavior (delegate to ampere) returns a valid config —
restore it. Keeps all (640, 512) RDNA entries unioned in round 1.
2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
----------------------------------------------------------------
turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
missed. Mirrors vendors/hip.h line 142.
3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
----------------------------------------------------------------
The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
so the file-local copies look unused on non-CUDA builds. Annotate
with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
locals in the state-save k/v writer loops where `il` was unreferenced —
dead-code from a removed logging pass.
4. scripts/xxd.cmake — defensive quote of ${hex_data}
----------------------------------------------------------------
Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
makes the underlying xxd.cmake robust: when an empty UI source file
slips through, produce a 0-length .hpp instead of crashing with
cmake's cryptic "string sub-command LENGTH requires two arguments"
error. Worth proposing upstream as a follow-up.
Local Metal build green on M5 Max with all four fixes applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
arch-fan's next nix-sandbox build (after PR ggml-org#23190 cherry-pick + earlier empty-input defensive quote) hit a different xxd.cmake failure: scripts/xxd.cmake:10 (file): file failed to open for reading (No such file or directory): /build/source/build/tools/ui/dist/bundle.js Empty-file case (LENGTH error) was already handled by quoting the variable. This is the sibling case: file READ itself fails when the UI provisioning flow leaves an asset missing entirely (npm absent AND HF Bucket download blocked → some assets created empty, some not created at all). Fix: early-return with a valid 0-byte symbol when ${INPUT} doesn't exist. Also unify the empty-content path to emit {0} instead of {} (zero-element array initializer is C++ extension, not portable). Verified end-to-end on M5 Max by reproducing arch-fan's exact conditions: build/tools/ui/dist/ removed, PATH stripped of npm, LLAMA_USE_PREBUILT_UI=OFF. Without the fix, build crashes on bundle.js.hpp generation. With the fix, all four .hpp files generate as 0-byte symbols, llama-ui target completes cleanly, server builds with LLAMA_UI_DEFAULT_ENABLED=0 (no embedded UI but no crash) — exactly upstream's intended graceful degradation. No effect on normal builds with UI assets present (regenerated all 4 .hpp files at original 26MB / 2.5MB / 34KB / 1.4KB sizes, byte- identical to pre-fix output). Worth proposing upstream as defensive hardening for the xxd helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: tturney@psyguard.ai
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).
1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
(not zero-sentinel) in get_config_rdna
----------------------------------------------------------------
Reverts the round-1 conflict choice. Round 1 took upstream's new
sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
RDNA fallback. Template instances like
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
on the returned config (np = nwarps * cols_per_warp / ncols, etc).
nwarps=0 from the sentinel propagates to np=0, triggering compile-
time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
quality build is -Werror,-Wdivision-by-zero so it errors out.
TQ-tip behavior (delegate to ampere) returns a valid config —
restore it. Keeps all (640, 512) RDNA entries unioned in round 1.
2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
----------------------------------------------------------------
turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
missed. Mirrors vendors/hip.h line 142.
3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
----------------------------------------------------------------
The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
so the file-local copies look unused on non-CUDA builds. Annotate
with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
locals in the state-save k/v writer loops where `il` was unreferenced —
dead-code from a removed logging pass.
4. scripts/xxd.cmake — defensive quote of ${hex_data}
----------------------------------------------------------------
Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
makes the underlying xxd.cmake robust: when an empty UI source file
slips through, produce a 0-length .hpp instead of crashing with
cmake's cryptic "string sub-command LENGTH requires two arguments"
error. Worth proposing upstream as a follow-up.
Local Metal build green on M5 Max with all four fixes applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
* turboquant/HEAD: (82 commits) docs(readme): credit Google's original TurboQuant + explain the '+' docs(readme): fix turbo ladder ordering + cite K-compression paper docs(readme): reorder KV configs as a ladder + 'start light' guidance docs(readme): add Chronara to deployments + AtomicChat link docs: restructure README — professional layout, deployments, paper links docs: tighten README — add turbo2, missing features, paper links docs: keep upstream README, prepend fork-specific summary docs: replace upstream README with fork-specific summary fix(xxd.cmake): handle missing input file (not just empty) fix(ci): 4 cross-vendor -Werror failures + defensive xxd.cmake cmake : fix LLAMA_BUILD_UI logic (ggml-org#23190) fix(ggml-cuda): HIP nodiscard + MUSA cudaMemcpyToSymbol alias fix(turbo-quant): add forward declaration for turbo_cpu_fwht_inverse fix(metal): set ne12/ne13/r2/r3 function constants in mul_mm_tq_rotated pipeline webui: support video files as input (ggml-org#22830) server: (router) alloc tmp buffer on heap (ggml-org#23159) server: skip device enumeration in router mode to avoid creating CUDA primary context (ggml-org#23137) vulkan: removed duplicate #include <memory> in headers (ggml-org#23144) ui: Add request timeout for MCP tool calls (ggml-org#23138) sync : ggml ...
* turboquant: delected docs(readme): credit Google's original TurboQuant + explain the '+' docs(readme): fix turbo ladder ordering + cite K-compression paper docs(readme): reorder KV configs as a ladder + 'start light' guidance docs(readme): add Chronara to deployments + AtomicChat link docs: restructure README — professional layout, deployments, paper links docs: tighten README — add turbo2, missing features, paper links docs: keep upstream README, prepend fork-specific summary docs: replace upstream README with fork-specific summary fix(xxd.cmake): handle missing input file (not just empty) fix(ci): 4 cross-vendor -Werror failures + defensive xxd.cmake cmake : fix LLAMA_BUILD_UI logic (ggml-org#23190) fix(ggml-cuda): HIP nodiscard + MUSA cudaMemcpyToSymbol alias fix(turbo-quant): add forward declaration for turbo_cpu_fwht_inverse fix(metal): set ne12/ne13/r2/r3 function constants in mul_mm_tq_rotated pipeline
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).
1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
(not zero-sentinel) in get_config_rdna
----------------------------------------------------------------
Reverts the round-1 conflict choice. Round 1 took upstream's new
sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
RDNA fallback. Template instances like
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
on the returned config (np = nwarps * cols_per_warp / ncols, etc).
nwarps=0 from the sentinel propagates to np=0, triggering compile-
time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
quality build is -Werror,-Wdivision-by-zero so it errors out.
TQ-tip behavior (delegate to ampere) returns a valid config —
restore it. Keeps all (640, 512) RDNA entries unioned in round 1.
2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
----------------------------------------------------------------
turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
missed. Mirrors vendors/hip.h line 142.
3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
----------------------------------------------------------------
The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
so the file-local copies look unused on non-CUDA builds. Annotate
with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
locals in the state-save k/v writer loops where `il` was unreferenced —
dead-code from a removed logging pass.
4. scripts/xxd.cmake — defensive quote of ${hex_data}
----------------------------------------------------------------
Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
makes the underlying xxd.cmake robust: when an empty UI source file
slips through, produce a 0-length .hpp instead of crashing with
cmake's cryptic "string sub-command LENGTH requires two arguments"
error. Worth proposing upstream as a follow-up.
Local Metal build green on M5 Max with all four fixes applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).
1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
(not zero-sentinel) in get_config_rdna
----------------------------------------------------------------
Reverts the round-1 conflict choice. Round 1 took upstream's new
sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
RDNA fallback. Template instances like
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
on the returned config (np = nwarps * cols_per_warp / ncols, etc).
nwarps=0 from the sentinel propagates to np=0, triggering compile-
time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
quality build is -Werror,-Wdivision-by-zero so it errors out.
TQ-tip behavior (delegate to ampere) returns a valid config —
restore it. Keeps all (640, 512) RDNA entries unioned in round 1.
2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
----------------------------------------------------------------
turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
missed. Mirrors vendors/hip.h line 142.
3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
----------------------------------------------------------------
The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
so the file-local copies look unused on non-CUDA builds. Annotate
with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
locals in the state-save k/v writer loops where `il` was unreferenced —
dead-code from a removed logging pass.
4. scripts/xxd.cmake — defensive quote of ${hex_data}
----------------------------------------------------------------
Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
makes the underlying xxd.cmake robust: when an empty UI source file
slips through, produce a 0-length .hpp instead of crashing with
cmake's cryptic "string sub-command LENGTH requires two arguments"
error. Worth proposing upstream as a follow-up.
Local Metal build green on M5 Max with all four fixes applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Overview
ref: #23156 (comment)
LLAMA_BUILD_WEBUI, otherwiseDEFINED LLAMA_BUILD_WEBUIalways evaluates totrue.LLAMA_BUILD_UIto the value ofLLAMA_BUILD_WEBUIif set with-D. This will allow configuration with bothLLAMA_BUILD_UIandLLAMA_BUILD_WEBUI.-DLLAMA_BUILD_WEBUIand-DLLAMA_BUILD_UIwith different values will prioritizeLLAMA_BUILD_WEBUI. I think this is fine, as it’s an obvious user error.WEBUI, coalesce everything toUI.Additional information
Requirements