Skip to content

Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts#5302

Merged
danielhanchen merged 2 commits into
mainfrom
fix/llama-prebuilt-linux-cpu-upstream
May 6, 2026
Merged

Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts#5302
danielhanchen merged 2 commits into
mainfrom
fix/llama-prebuilt-linux-cpu-upstream

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Summary

studio/setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles (app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked roughly 30 releases looking for a non-existent app-*-linux-x64-cpu asset, exited the prebuilt planner with no compatible Linux prebuilt asset was found, and fell through to a source build. This is what every free ubuntu-latest runner hits, and what every Linux laptop without an NVIDIA GPU pays a ~3 minute cmake + make cost on at first install.

ggml-org/llama.cpp publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release, and studio/install_llama_prebuilt.py already knows how to fetch it: when called with --published-repo ggml-org/llama.cpp, the direct_upstream_release_plan branch at host.is_linux and host.is_x86_64 and not host.has_usable_nvidia picks up that asset directly (install_llama_prebuilt.py:1313-1326). The bug was purely in the routing.

Fix

Tighten the gate in studio/setup.sh so a Linux host routes to ggml-org/llama.cpp only when it is x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo, amd-smi, hipconfig, hipinfo). Everything else stays on the current path.

_LINUX_HAS_GPU=false
for _GPU_TOOL in nvidia-smi rocminfo amd-smi hipconfig hipinfo; do
    if command -v "$_GPU_TOOL" >/dev/null 2>&1; then
        _LINUX_HAS_GPU=true
        break
    fi
done

if [ "$_HOST_SYSTEM" = "Darwin" ]; then
    _HELPER_RELEASE_REPO="ggml-org/llama.cpp"
elif [ "$_HOST_SYSTEM" = "Linux" ] \
        && [ "$_HOST_MACHINE" = "x86_64" ] \
        && [ "$_LINUX_HAS_GPU" = false ]; then
    _HELPER_RELEASE_REPO="ggml-org/llama.cpp"
else
    _HELPER_RELEASE_REPO="unslothai/llama.cpp"
fi

Routing matrix

Only one cell flips. Everything else is identity.

Host + accelerator Today After this PR
macOS arm64 / x86_64 ggml-org bin-macos-{arm64,x64}.tar.gz unchanged
Windows + CPU ggml-org bin-win-cpu-x64.zip (via setup.ps1) unchanged
Windows + CUDA ggml-org bin-win-cuda-*.zip unchanged
Windows + ROCm / HIP ggml-org bin-win-hip-radeon-x64.zip unchanged
Linux + CPU x86_64 source build (~3 min) ggml-org bin-ubuntu-x64.tar.gz (~10 sec)
Linux + CUDA unslothai/llama.cpp app-*-linux-x64-cuda*.tar.gz unchanged (nvidia-smi -> unslothai)
Linux + ROCm / AMD source build with -DGGML_HIP=ON unchanged (rocminfo / amd-smi / hipconfig / hipinfo -> unslothai)
Linux + Intel / Vulkan / SYCL source build (CPU output) upstream CPU asset (functionally equivalent, faster)
Linux arm64 / s390x source build unchanged (not x86_64 -> unslothai -> source build)

Linux ROCm fast-path (bin-ubuntu-rocm-7.2-x64.tar.gz) is intentionally out of scope for this PR. The richer code path resolve_upstream_asset_choice at install_llama_prebuilt.py:3124-3183 already knows how to pick the right ROCm minor against the host runtime; porting that into direct_upstream_release_plan so AMD users on Linux also get a prebuilt is a clean follow-up.

Verification

Locally exercised the gate against synthetic PATHs covering every combination above. Routing matches the matrix, including the corner cases (MINGW* uname, aarch64 with NVIDIA tools, Darwin with NVIDIA tools).

bash -n studio/setup.sh passes.

Test plan

  • Free ubuntu-latest runner on a CPU job: install.sh log shows prebuilt installed and validated rather than falling back to source build.
  • Linux NVIDIA host: still resolves to unslothai/llama.cpp and picks the existing CUDA bundle.
  • Linux ROCm host: still resolves to unslothai/llama.cpp, prebuilt resolution fails as before, and falls through to the existing source build with -DGGML_HIP=ON.
  • macOS arm64: unchanged (still ggml-org/llama.cpp via the Darwin branch).

setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every
non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles
(app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30
releases looking for a non-existent app-*-linux-x64-cpu asset, exited
the prebuilt planner with "no compatible Linux prebuilt asset was
found", and fell through to a source build. Free CI runners
(ubuntu-latest with no GPU) hit this on every install, and anyone
running Studio on a Linux laptop without an NVIDIA GPU paid the
~3 minute cmake+make cost on first install.

ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release
and install_llama_prebuilt.py already knows how to fetch it: when
called with --published-repo ggml-org/llama.cpp, the Linux x86_64 +
not has_usable_nvidia branch in direct_upstream_release_plan picks up
that asset directly. The fix is purely on the routing side.

Tighten the gate so a Linux host routes to ggml-org only when it is
x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo,
amd-smi, hipconfig, hipinfo). Everything else stays on the current
path:

  - macOS: already on ggml-org, unchanged
  - Windows: already on ggml-org via setup.ps1, unchanged
  - Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged
  - Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present
                -> unslothai/llama.cpp -> source build with HIP,
                unchanged
  - Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new
                ggml-org route, gets upstream CPU asset (same as
                today's source-build CPU output, ~3 min faster)
  - Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp ->
                source build, unchanged
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c84a012847

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.sh
Comment on lines +605 to +606
if command -v "$_GPU_TOOL" >/dev/null 2>&1; then
_LINUX_HAS_GPU=true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Probe usable GPUs instead of tool presence

On Linux x86_64 CPU-only environments that still have GPU utilities on PATH, such as CUDA-based Docker images run without --gpus or hosts with CUDA_VISIBLE_DEVICES hiding all devices, this command -v nvidia-smi check routes setup back to unslothai/llama.cpp. The Python installer already distinguishes this case as has_usable_nvidia=false, but with the unsloth repo it then scans CUDA-only Linux assets and falls back to a source build, so the new CPU prebuilt fast path is skipped exactly for these CPU-only installs. Please make this gate use the same active GPU probing semantics as detect_host() or defer the routing until after that detection.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates studio/setup.sh to improve the selection of prebuilt binaries for CPU-only Linux x86_64 hosts by routing them to the ggml-org/llama.cpp repository. This prevents these hosts from attempting to download non-existent CPU assets from the Unsloth repository and falling back to source builds. The reviewer suggested combining the conditional logic for Darwin and CPU-only Linux into a single block using modern Bash syntax to improve code conciseness.

Comment thread studio/setup.sh
Comment on lines 611 to 619
if [ "$_HOST_SYSTEM" = "Darwin" ]; then
_HELPER_RELEASE_REPO="ggml-org/llama.cpp"
elif [ "$_HOST_SYSTEM" = "Linux" ] \
&& [ "$_HOST_MACHINE" = "x86_64" ] \
&& [ "$_LINUX_HAS_GPU" = false ]; then
_HELPER_RELEASE_REPO="ggml-org/llama.cpp"
else
_HELPER_RELEASE_REPO="unslothai/llama.cpp"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve conciseness and avoid repeating the assignment to _HELPER_RELEASE_REPO, you can combine the conditions for Darwin and CPU-only Linux into a single if block. Using the [[ ... ]] construct is also more modern and readable for complex conditions in Bash.

Suggested change
if [ "$_HOST_SYSTEM" = "Darwin" ]; then
_HELPER_RELEASE_REPO="ggml-org/llama.cpp"
elif [ "$_HOST_SYSTEM" = "Linux" ] \
&& [ "$_HOST_MACHINE" = "x86_64" ] \
&& [ "$_LINUX_HAS_GPU" = false ]; then
_HELPER_RELEASE_REPO="ggml-org/llama.cpp"
else
_HELPER_RELEASE_REPO="unslothai/llama.cpp"
fi
if [[ "$_HOST_SYSTEM" == "Darwin" || ( "$_HOST_SYSTEM" == "Linux" && "$_HOST_MACHINE" == "x86_64" && "$_LINUX_HAS_GPU" == false ) ]]; then
_HELPER_RELEASE_REPO="ggml-org/llama.cpp"
else
_HELPER_RELEASE_REPO="unslothai/llama.cpp"
fi

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 449f84fb57

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.sh
# x86_64 routes to ggml-org for bin-ubuntu-x64.tar.gz. Anything with a
# GPU tool installed stays on unslothai (CUDA bundle / ROCm source build).
_LINUX_HAS_GPU=false
for _GPU_TOOL in nvidia-smi rocminfo amd-smi hipconfig hipinfo; do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include compiler probes in GPU routing

On Linux x86_64 hosts where the CUDA/ROCm compiler is installed but these runtime utilities are not on PATH (for example nvcc under /usr/local/cuda/bin or hipcc under /opt/rocm/bin), this loop leaves _LINUX_HAS_GPU=false and the new branch installs the upstream CPU tarball successfully. That suppresses the existing source-build path that explicitly checks those compiler locations and enables -DGGML_CUDA=ON / -DGGML_HIP=ON later in this same script, so those environments silently lose GPU-enabled llama.cpp instead of building it as before.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen merged commit 7de1f4c into main May 6, 2026
6 checks passed
@danielhanchen danielhanchen deleted the fix/llama-prebuilt-linux-cpu-upstream branch May 6, 2026 06:22
danielhanchen added a commit that referenced this pull request May 6, 2026
* Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts

setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every
non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles
(app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30
releases looking for a non-existent app-*-linux-x64-cpu asset, exited
the prebuilt planner with "no compatible Linux prebuilt asset was
found", and fell through to a source build. Free CI runners
(ubuntu-latest with no GPU) hit this on every install, and anyone
running Studio on a Linux laptop without an NVIDIA GPU paid the
~3 minute cmake+make cost on first install.

ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release
and install_llama_prebuilt.py already knows how to fetch it: when
called with --published-repo ggml-org/llama.cpp, the Linux x86_64 +
not has_usable_nvidia branch in direct_upstream_release_plan picks up
that asset directly. The fix is purely on the routing side.

Tighten the gate so a Linux host routes to ggml-org only when it is
x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo,
amd-smi, hipconfig, hipinfo). Everything else stays on the current
path:

  - macOS: already on ggml-org, unchanged
  - Windows: already on ggml-org via setup.ps1, unchanged
  - Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged
  - Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present
                -> unslothai/llama.cpp -> source build with HIP,
                unchanged
  - Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new
                ggml-org route, gets upstream CPU asset (same as
                today's source-build CPU output, ~3 min faster)
  - Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp ->
                source build, unchanged

* Tighten routing comment in studio/setup.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant