Skip to content

cmake : fix LLAMA_BUILD_UI logic#23190

Merged
aldehir merged 1 commit into
ggml-org:masterfrom
aldehir:offline-build
May 17, 2026
Merged

cmake : fix LLAMA_BUILD_UI logic#23190
aldehir merged 1 commit into
ggml-org:masterfrom
aldehir:offline-build

Conversation

@aldehir
Copy link
Copy Markdown
Contributor

@aldehir aldehir commented May 17, 2026

Overview

ref: #23156 (comment)

  • Remove options for LLAMA_BUILD_WEBUI, otherwise DEFINED LLAMA_BUILD_WEBUI always evaluates to true.
  • Simply set LLAMA_BUILD_UI to the value of LLAMA_BUILD_WEBUI if set with -D. This will allow configuration with both LLAMA_BUILD_UI and LLAMA_BUILD_WEBUI.
  • Change in logic, specifying both -DLLAMA_BUILD_WEBUI and -DLLAMA_BUILD_UI with different values will prioritize LLAMA_BUILD_WEBUI. I think this is fine, as it’s an obvious user error.
  • Cleaned up references to WEBUI, coalesce everything to UI.

Additional information

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, to review. It gave some bad suggestions…

@aldehir aldehir requested review from a team and ggerganov as code owners May 17, 2026 07:07
@allozaur
Copy link
Copy Markdown
Contributor

Hmm, some checks are failing... let's re-run them and make sure that all is working before merging

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented May 17, 2026

I've been seeing some of these tests fail on other PRs as well. I'll wait for all of them to complete.

@mbednarek360
Copy link
Copy Markdown

Should the CMAKE flag in package.nix also be updated?

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented May 17, 2026

Looks like it's just the WebGPU failing now, which seems to be currently broken. Merging this in.

@aldehir aldehir merged commit 8758904 into ggml-org:master May 17, 2026
84 of 95 checks passed
TheTom pushed a commit to TheTom/llama-cpp-turboquant that referenced this pull request May 17, 2026
Cherry-picked from upstream ggml-org/llama.cpp@87589042c (merged 2026-05-17).

option(LLAMA_BUILD_WEBUI ... ON) always leaves the deprecated flag DEFINED,
so the compat-block guard `AND NOT DEFINED LLAMA_BUILD_UI` never fires.
tools/ui/CMakeLists.txt then ORs both flags, so passing only the new
`-DLLAMA_BUILD_UI=OFF` was silently ignored. Removes the deprecated
options and simplifies the compat block + UI gate to a single flag.

Fixes the nix-sandbox build failure reported by @arch-fan and @pacak on
PR #146 — both hit the resulting xxd.cmake crash when an empty
tools/ui/dist/index.html was produced by failed npm + HF Bucket
provisioning. After this cherry-pick, `-DLLAMA_BUILD_UI=OFF` alone
works as documented.

Co-Authored-By: TheTom <tturney@psyguard.ai>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request May 17, 2026
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).

1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
   (not zero-sentinel) in get_config_rdna
   ----------------------------------------------------------------
   Reverts the round-1 conflict choice. Round 1 took upstream's new
   sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
   RDNA fallback. Template instances like
   fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
   on the returned config (np = nwarps * cols_per_warp / ncols, etc).
   nwarps=0 from the sentinel propagates to np=0, triggering compile-
   time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
   quality build is -Werror,-Wdivision-by-zero so it errors out.
   TQ-tip behavior (delegate to ampere) returns a valid config —
   restore it. Keeps all (640, 512) RDNA entries unioned in round 1.

2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
   ----------------------------------------------------------------
   turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
   cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
   missed. Mirrors vendors/hip.h line 142.

3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
   ----------------------------------------------------------------
   The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
   turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
   are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
   so the file-local copies look unused on non-CUDA builds. Annotate
   with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
   locals in the state-save k/v writer loops where `il` was unreferenced —
   dead-code from a removed logging pass.

4. scripts/xxd.cmake — defensive quote of ${hex_data}
   ----------------------------------------------------------------
   Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
   primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
   commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
   makes the underlying xxd.cmake robust: when an empty UI source file
   slips through, produce a 0-length .hpp instead of crashing with
   cmake's cryptic "string sub-command LENGTH requires two arguments"
   error. Worth proposing upstream as a follow-up.

Local Metal build green on M5 Max with all four fixes applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request May 17, 2026
arch-fan's next nix-sandbox build (after PR ggml-org#23190 cherry-pick + earlier
empty-input defensive quote) hit a different xxd.cmake failure:

  scripts/xxd.cmake:10 (file):
    file failed to open for reading (No such file or directory):
    /build/source/build/tools/ui/dist/bundle.js

Empty-file case (LENGTH error) was already handled by quoting the
variable. This is the sibling case: file READ itself fails when the
UI provisioning flow leaves an asset missing entirely (npm absent
AND HF Bucket download blocked → some assets created empty, some
not created at all).

Fix: early-return with a valid 0-byte symbol when ${INPUT} doesn't
exist. Also unify the empty-content path to emit {0} instead of {}
(zero-element array initializer is C++ extension, not portable).

Verified end-to-end on M5 Max by reproducing arch-fan's exact
conditions: build/tools/ui/dist/ removed, PATH stripped of npm,
LLAMA_USE_PREBUILT_UI=OFF. Without the fix, build crashes on
bundle.js.hpp generation. With the fix, all four .hpp files generate
as 0-byte symbols, llama-ui target completes cleanly, server builds
with LLAMA_UI_DEFAULT_ENABLED=0 (no embedded UI but no crash) —
exactly upstream's intended graceful degradation.

No effect on normal builds with UI assets present (regenerated all 4
.hpp files at original 26MB / 2.5MB / 34KB / 1.4KB sizes, byte-
identical to pre-fix output).

Worth proposing upstream as defensive hardening for the xxd helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
@Green-Sky
Copy link
Copy Markdown
Collaborator

Should the CMAKE flag in package.nix also be updated?

Nix is actually broken because it does not allow network access during build.

@candrews
Copy link
Copy Markdown

Should the CMAKE flag in package.nix also be updated?

Nix is actually broken because it does not allow network access during build.

Gentoo is broken for the same reason.

Puqns67 added a commit to Puqns67/gentoo-zh that referenced this pull request May 18, 2026
@vbooka1
Copy link
Copy Markdown

vbooka1 commented May 18, 2026

Nix is actually broken because it does not allow network access during build.

Gentoo is broken for the same reason.

you can download and copy WebUI files from an online machine: #23156 (comment)

Puqns67 added a commit to Puqns67/gentoo-zh that referenced this pull request May 18, 2026
peeweep pushed a commit to microcai/gentoo-zh that referenced this pull request May 18, 2026
kgrama pushed a commit to kgrama/llama.cpp that referenced this pull request May 19, 2026
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 19, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026
dbrain pushed a commit to dbrain/hbd-llama-cpp-turboquant that referenced this pull request May 21, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
LogicDaemon pushed a commit to LogicDaemon/llama-cpp-turboquant that referenced this pull request May 27, 2026
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).

1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
   (not zero-sentinel) in get_config_rdna
   ----------------------------------------------------------------
   Reverts the round-1 conflict choice. Round 1 took upstream's new
   sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
   RDNA fallback. Template instances like
   fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
   on the returned config (np = nwarps * cols_per_warp / ncols, etc).
   nwarps=0 from the sentinel propagates to np=0, triggering compile-
   time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
   quality build is -Werror,-Wdivision-by-zero so it errors out.
   TQ-tip behavior (delegate to ampere) returns a valid config —
   restore it. Keeps all (640, 512) RDNA entries unioned in round 1.

2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
   ----------------------------------------------------------------
   turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
   cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
   missed. Mirrors vendors/hip.h line 142.

3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
   ----------------------------------------------------------------
   The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
   turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
   are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
   so the file-local copies look unused on non-CUDA builds. Annotate
   with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
   locals in the state-save k/v writer loops where `il` was unreferenced —
   dead-code from a removed logging pass.

4. scripts/xxd.cmake — defensive quote of ${hex_data}
   ----------------------------------------------------------------
   Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
   primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
   commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
   makes the underlying xxd.cmake robust: when an empty UI source file
   slips through, produce a 0-length .hpp instead of crashing with
   cmake's cryptic "string sub-command LENGTH requires two arguments"
   error. Worth proposing upstream as a follow-up.

Local Metal build green on M5 Max with all four fixes applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
LogicDaemon pushed a commit to LogicDaemon/llama-cpp-turboquant that referenced this pull request May 27, 2026
arch-fan's next nix-sandbox build (after PR ggml-org#23190 cherry-pick + earlier
empty-input defensive quote) hit a different xxd.cmake failure:

  scripts/xxd.cmake:10 (file):
    file failed to open for reading (No such file or directory):
    /build/source/build/tools/ui/dist/bundle.js

Empty-file case (LENGTH error) was already handled by quoting the
variable. This is the sibling case: file READ itself fails when the
UI provisioning flow leaves an asset missing entirely (npm absent
AND HF Bucket download blocked → some assets created empty, some
not created at all).

Fix: early-return with a valid 0-byte symbol when ${INPUT} doesn't
exist. Also unify the empty-content path to emit {0} instead of {}
(zero-element array initializer is C++ extension, not portable).

Verified end-to-end on M5 Max by reproducing arch-fan's exact
conditions: build/tools/ui/dist/ removed, PATH stripped of npm,
LLAMA_USE_PREBUILT_UI=OFF. Without the fix, build crashes on
bundle.js.hpp generation. With the fix, all four .hpp files generate
as 0-byte symbols, llama-ui target completes cleanly, server builds
with LLAMA_UI_DEFAULT_ENABLED=0 (no embedded UI but no crash) —
exactly upstream's intended graceful degradation.

No effect on normal builds with UI assets present (regenerated all 4
.hpp files at original 26MB / 2.5MB / 34KB / 1.4KB sizes, byte-
identical to pre-fix output).

Worth proposing upstream as defensive hardening for the xxd helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Moore2877 pushed a commit to Moore2877/llama-cpp-turboquant-synced that referenced this pull request May 31, 2026
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).

1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
   (not zero-sentinel) in get_config_rdna
   ----------------------------------------------------------------
   Reverts the round-1 conflict choice. Round 1 took upstream's new
   sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
   RDNA fallback. Template instances like
   fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
   on the returned config (np = nwarps * cols_per_warp / ncols, etc).
   nwarps=0 from the sentinel propagates to np=0, triggering compile-
   time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
   quality build is -Werror,-Wdivision-by-zero so it errors out.
   TQ-tip behavior (delegate to ampere) returns a valid config —
   restore it. Keeps all (640, 512) RDNA entries unioned in round 1.

2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
   ----------------------------------------------------------------
   turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
   cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
   missed. Mirrors vendors/hip.h line 142.

3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
   ----------------------------------------------------------------
   The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
   turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
   are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
   so the file-local copies look unused on non-CUDA builds. Annotate
   with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
   locals in the state-save k/v writer loops where `il` was unreferenced —
   dead-code from a removed logging pass.

4. scripts/xxd.cmake — defensive quote of ${hex_data}
   ----------------------------------------------------------------
   Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
   primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
   commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
   makes the underlying xxd.cmake robust: when an empty UI source file
   slips through, produce a 0-length .hpp instead of crashing with
   cmake's cryptic "string sub-command LENGTH requires two arguments"
   error. Worth proposing upstream as a follow-up.

Local Metal build green on M5 Max with all four fixes applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Moore2877 pushed a commit to Moore2877/llama-cpp-turboquant-synced that referenced this pull request May 31, 2026
arch-fan's next nix-sandbox build (after PR ggml-org#23190 cherry-pick + earlier
empty-input defensive quote) hit a different xxd.cmake failure:

  scripts/xxd.cmake:10 (file):
    file failed to open for reading (No such file or directory):
    /build/source/build/tools/ui/dist/bundle.js

Empty-file case (LENGTH error) was already handled by quoting the
variable. This is the sibling case: file READ itself fails when the
UI provisioning flow leaves an asset missing entirely (npm absent
AND HF Bucket download blocked → some assets created empty, some
not created at all).

Fix: early-return with a valid 0-byte symbol when ${INPUT} doesn't
exist. Also unify the empty-content path to emit {0} instead of {}
(zero-element array initializer is C++ extension, not portable).

Verified end-to-end on M5 Max by reproducing arch-fan's exact
conditions: build/tools/ui/dist/ removed, PATH stripped of npm,
LLAMA_USE_PREBUILT_UI=OFF. Without the fix, build crashes on
bundle.js.hpp generation. With the fix, all four .hpp files generate
as 0-byte symbols, llama-ui target completes cleanly, server builds
with LLAMA_UI_DEFAULT_ENABLED=0 (no embedded UI but no crash) —
exactly upstream's intended graceful degradation.

No effect on normal builds with UI assets present (regenerated all 4
.hpp files at original 26MB / 2.5MB / 34KB / 1.4KB sizes, byte-
identical to pre-fix output).

Worth proposing upstream as defensive hardening for the xxd helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Moore2877 pushed a commit to Moore2877/llama-cpp-turboquant-synced that referenced this pull request May 31, 2026
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).

1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
   (not zero-sentinel) in get_config_rdna
   ----------------------------------------------------------------
   Reverts the round-1 conflict choice. Round 1 took upstream's new
   sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
   RDNA fallback. Template instances like
   fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
   on the returned config (np = nwarps * cols_per_warp / ncols, etc).
   nwarps=0 from the sentinel propagates to np=0, triggering compile-
   time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
   quality build is -Werror,-Wdivision-by-zero so it errors out.
   TQ-tip behavior (delegate to ampere) returns a valid config —
   restore it. Keeps all (640, 512) RDNA entries unioned in round 1.

2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
   ----------------------------------------------------------------
   turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
   cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
   missed. Mirrors vendors/hip.h line 142.

3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
   ----------------------------------------------------------------
   The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
   turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
   are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
   so the file-local copies look unused on non-CUDA builds. Annotate
   with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
   locals in the state-save k/v writer loops where `il` was unreferenced —
   dead-code from a removed logging pass.

4. scripts/xxd.cmake — defensive quote of ${hex_data}
   ----------------------------------------------------------------
   Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
   primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
   commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
   makes the underlying xxd.cmake robust: when an empty UI source file
   slips through, produce a 0-length .hpp instead of crashing with
   cmake's cryptic "string sub-command LENGTH requires two arguments"
   error. Worth proposing upstream as a follow-up.

Local Metal build green on M5 Max with all four fixes applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 2, 2026
* turboquant/HEAD: (82 commits)
  docs(readme): credit Google's original TurboQuant + explain the '+'
  docs(readme): fix turbo ladder ordering + cite K-compression paper
  docs(readme): reorder KV configs as a ladder + 'start light' guidance
  docs(readme): add Chronara to deployments + AtomicChat link
  docs: restructure README — professional layout, deployments, paper links
  docs: tighten README — add turbo2, missing features, paper links
  docs: keep upstream README, prepend fork-specific summary
  docs: replace upstream README with fork-specific summary
  fix(xxd.cmake): handle missing input file (not just empty)
  fix(ci): 4 cross-vendor -Werror failures + defensive xxd.cmake
  cmake : fix LLAMA_BUILD_UI logic (ggml-org#23190)
  fix(ggml-cuda): HIP nodiscard + MUSA cudaMemcpyToSymbol alias
  fix(turbo-quant): add forward declaration for turbo_cpu_fwht_inverse
  fix(metal): set ne12/ne13/r2/r3 function constants in mul_mm_tq_rotated pipeline
  webui: support video files as input (ggml-org#22830)
  server: (router) alloc tmp buffer on heap (ggml-org#23159)
  server: skip device enumeration in router mode to avoid creating CUDA primary context (ggml-org#23137)
  vulkan: removed duplicate #include <memory> in headers (ggml-org#23144)
  ui: Add request timeout for MCP tool calls (ggml-org#23138)
  sync : ggml
  ...
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 2, 2026
* turboquant:
  delected
  docs(readme): credit Google's original TurboQuant + explain the '+'
  docs(readme): fix turbo ladder ordering + cite K-compression paper
  docs(readme): reorder KV configs as a ladder + 'start light' guidance
  docs(readme): add Chronara to deployments + AtomicChat link
  docs: restructure README — professional layout, deployments, paper links
  docs: tighten README — add turbo2, missing features, paper links
  docs: keep upstream README, prepend fork-specific summary
  docs: replace upstream README with fork-specific summary
  fix(xxd.cmake): handle missing input file (not just empty)
  fix(ci): 4 cross-vendor -Werror failures + defensive xxd.cmake
  cmake : fix LLAMA_BUILD_UI logic (ggml-org#23190)
  fix(ggml-cuda): HIP nodiscard + MUSA cudaMemcpyToSymbol alias
  fix(turbo-quant): add forward declaration for turbo_cpu_fwht_inverse
  fix(metal): set ne12/ne13/r2/r3 function constants in mul_mm_tq_rotated pipeline
wel97459 pushed a commit to wel97459/llama-cpp-turboquant that referenced this pull request Jun 4, 2026
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).

1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
   (not zero-sentinel) in get_config_rdna
   ----------------------------------------------------------------
   Reverts the round-1 conflict choice. Round 1 took upstream's new
   sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
   RDNA fallback. Template instances like
   fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
   on the returned config (np = nwarps * cols_per_warp / ncols, etc).
   nwarps=0 from the sentinel propagates to np=0, triggering compile-
   time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
   quality build is -Werror,-Wdivision-by-zero so it errors out.
   TQ-tip behavior (delegate to ampere) returns a valid config —
   restore it. Keeps all (640, 512) RDNA entries unioned in round 1.

2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
   ----------------------------------------------------------------
   turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
   cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
   missed. Mirrors vendors/hip.h line 142.

3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
   ----------------------------------------------------------------
   The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
   turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
   are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
   so the file-local copies look unused on non-CUDA builds. Annotate
   with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
   locals in the state-save k/v writer loops where `il` was unreferenced —
   dead-code from a removed logging pass.

4. scripts/xxd.cmake — defensive quote of ${hex_data}
   ----------------------------------------------------------------
   Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
   primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
   commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
   makes the underlying xxd.cmake robust: when an empty UI source file
   slips through, produce a 0-length .hpp instead of crashing with
   cmake's cryptic "string sub-command LENGTH requires two arguments"
   error. Worth proposing upstream as a follow-up.

Local Metal build green on M5 Max with all four fixes applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
KGardevoir pushed a commit to KGardevoir/llama-cpp-turboquant that referenced this pull request Jun 5, 2026
Round 2 of CI fixes addressing the remaining red jobs on the b9190 sync
PR. All were pre-existing TQ-tip bugs exposed by upstream CI's -Werror
policy (M5 Max + M2 mini local builds don't use -Werror).

1. ggml/src/ggml-cuda/fattn-mma-f16.cuh — fall back to ampere config
   (not zero-sentinel) in get_config_rdna
   ----------------------------------------------------------------
   Reverts the round-1 conflict choice. Round 1 took upstream's new
   sentinel `fattn_mma_config(32, 1, 0, 0, 0, 0, 0, false)` for the
   RDNA fallback. Template instances like
   fattn-mma-f16-instance-ncols1_1-ncols2_16.cu do constexpr arithmetic
   on the returned config (np = nwarps * cols_per_warp / ncols, etc).
   nwarps=0 from the sentinel propagates to np=0, triggering compile-
   time div/mod-by-zero at lines 1265/1371/1375/1512/1519/1572. HIP
   quality build is -Werror,-Wdivision-by-zero so it errors out.
   TQ-tip behavior (delegate to ampere) returns a valid config —
   restore it. Keeps all (640, 512) RDNA entries unioned in round 1.

2. ggml/src/ggml-cuda/vendors/musa.h — add cudaMemcpyFromSymbol alias
   ----------------------------------------------------------------
   turbo-quant.cuh InnerQ calibration uses both cudaMemcpyToSymbol AND
   cudaMemcpyFromSymbol. Round-1 fix added _ToSymbol; _FromSymbol was
   missed. Mirrors vendors/hip.h line 142.

3. src/llama-kv-cache.cpp — [[maybe_unused]] stubs + remove unused `il`
   ----------------------------------------------------------------
   The non-CUDA stub block (g_innerq_finalized, g_innerq_scale_inv_host,
   turbo_innerq_needs_tensor_update, turbo_innerq_mark_tensor_updated)
   are declared static but every consumer is gated by #ifdef GGML_USE_CUDA,
   so the file-local copies look unused on non-CUDA builds. Annotate
   with [[maybe_unused]]. Also drops two `const uint32_t il = layer.il;`
   locals in the state-save k/v writer loops where `il` was unreferenced —
   dead-code from a removed logging pass.

4. scripts/xxd.cmake — defensive quote of ${hex_data}
   ----------------------------------------------------------------
   Belt-and-suspenders for the LLAMA_BUILD_UI nix-sandbox failure. The
   primary fix is the cherry-pick of upstream PR ggml-org#23190 (previous
   commit), which makes -DLLAMA_BUILD_UI=OFF actually work. This patch
   makes the underlying xxd.cmake robust: when an empty UI source file
   slips through, produce a 0-length .hpp instead of crashing with
   cmake's cryptic "string sub-command LENGTH requires two arguments"
   error. Worth proposing upstream as a follow-up.

Local Metal build green on M5 Max with all four fixes applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants