Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts by danielhanchen · Pull Request #5302 · unslothai/unsloth

danielhanchen · 2026-05-06T06:12:25Z

Summary

studio/setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles (app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked roughly 30 releases looking for a non-existent app-*-linux-x64-cpu asset, exited the prebuilt planner with no compatible Linux prebuilt asset was found, and fell through to a source build. This is what every free ubuntu-latest runner hits, and what every Linux laptop without an NVIDIA GPU pays a ~3 minute cmake + make cost on at first install.

ggml-org/llama.cpp publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release, and studio/install_llama_prebuilt.py already knows how to fetch it: when called with --published-repo ggml-org/llama.cpp, the direct_upstream_release_plan branch at host.is_linux and host.is_x86_64 and not host.has_usable_nvidia picks up that asset directly (install_llama_prebuilt.py:1313-1326). The bug was purely in the routing.

Fix

Tighten the gate in studio/setup.sh so a Linux host routes to ggml-org/llama.cpp only when it is x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo, amd-smi, hipconfig, hipinfo). Everything else stays on the current path.

_LINUX_HAS_GPU=false
for _GPU_TOOL in nvidia-smi rocminfo amd-smi hipconfig hipinfo; do
    if command -v "$_GPU_TOOL" >/dev/null 2>&1; then
        _LINUX_HAS_GPU=true
        break
    fi
done

if [ "$_HOST_SYSTEM" = "Darwin" ]; then
    _HELPER_RELEASE_REPO="ggml-org/llama.cpp"
elif [ "$_HOST_SYSTEM" = "Linux" ] \
        && [ "$_HOST_MACHINE" = "x86_64" ] \
        && [ "$_LINUX_HAS_GPU" = false ]; then
    _HELPER_RELEASE_REPO="ggml-org/llama.cpp"
else
    _HELPER_RELEASE_REPO="unslothai/llama.cpp"
fi

Routing matrix

Only one cell flips. Everything else is identity.

Host + accelerator	Today	After this PR
macOS arm64 / x86_64	ggml-org `bin-macos-{arm64,x64}.tar.gz`	unchanged
Windows + CPU	ggml-org `bin-win-cpu-x64.zip` (via setup.ps1)	unchanged
Windows + CUDA	ggml-org `bin-win-cuda-*.zip`	unchanged
Windows + ROCm / HIP	ggml-org `bin-win-hip-radeon-x64.zip`	unchanged
Linux + CPU x86_64	source build (~3 min)	ggml-org `bin-ubuntu-x64.tar.gz` (~10 sec)
Linux + CUDA	unslothai/llama.cpp `app--linux-x64-cuda.tar.gz`	unchanged (nvidia-smi -> unslothai)
Linux + ROCm / AMD	source build with `-DGGML_HIP=ON`	unchanged (rocminfo / amd-smi / hipconfig / hipinfo -> unslothai)
Linux + Intel / Vulkan / SYCL	source build (CPU output)	upstream CPU asset (functionally equivalent, faster)
Linux arm64 / s390x	source build	unchanged (not x86_64 -> unslothai -> source build)

Linux ROCm fast-path (bin-ubuntu-rocm-7.2-x64.tar.gz) is intentionally out of scope for this PR. The richer code path resolve_upstream_asset_choice at install_llama_prebuilt.py:3124-3183 already knows how to pick the right ROCm minor against the host runtime; porting that into direct_upstream_release_plan so AMD users on Linux also get a prebuilt is a clean follow-up.

Verification

Locally exercised the gate against synthetic PATHs covering every combination above. Routing matches the matrix, including the corner cases (MINGW* uname, aarch64 with NVIDIA tools, Darwin with NVIDIA tools).

bash -n studio/setup.sh passes.

Test plan

Free ubuntu-latest runner on a CPU job: install.sh log shows prebuilt installed and validated rather than falling back to source build.
Linux NVIDIA host: still resolves to unslothai/llama.cpp and picks the existing CUDA bundle.
Linux ROCm host: still resolves to unslothai/llama.cpp, prebuilt resolution fails as before, and falls through to the existing source build with -DGGML_HIP=ON.
macOS arm64: unchanged (still ggml-org/llama.cpp via the Darwin branch).

setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles (app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30 releases looking for a non-existent app-*-linux-x64-cpu asset, exited the prebuilt planner with "no compatible Linux prebuilt asset was found", and fell through to a source build. Free CI runners (ubuntu-latest with no GPU) hit this on every install, and anyone running Studio on a Linux laptop without an NVIDIA GPU paid the ~3 minute cmake+make cost on first install. ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release and install_llama_prebuilt.py already knows how to fetch it: when called with --published-repo ggml-org/llama.cpp, the Linux x86_64 + not has_usable_nvidia branch in direct_upstream_release_plan picks up that asset directly. The fix is purely on the routing side. Tighten the gate so a Linux host routes to ggml-org only when it is x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo, amd-smi, hipconfig, hipinfo). Everything else stays on the current path: - macOS: already on ggml-org, unchanged - Windows: already on ggml-org via setup.ps1, unchanged - Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged - Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present -> unslothai/llama.cpp -> source build with HIP, unchanged - Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new ggml-org route, gets upstream CPU asset (same as today's source-build CPU output, ~3 min faster) - Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp -> source build, unchanged

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c84a012847

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T06:13:57Z

+    if command -v "$_GPU_TOOL" >/dev/null 2>&1; then
+        _LINUX_HAS_GPU=true


Probe usable GPUs instead of tool presence

On Linux x86_64 CPU-only environments that still have GPU utilities on PATH, such as CUDA-based Docker images run without --gpus or hosts with CUDA_VISIBLE_DEVICES hiding all devices, this command -v nvidia-smi check routes setup back to unslothai/llama.cpp. The Python installer already distinguishes this case as has_usable_nvidia=false, but with the unsloth repo it then scans CUDA-only Linux assets and falls back to a source build, so the new CPU prebuilt fast path is skipped exactly for these CPU-only installs. Please make this gate use the same active GPU probing semantics as detect_host() or defer the routing until after that detection.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request updates studio/setup.sh to improve the selection of prebuilt binaries for CPU-only Linux x86_64 hosts by routing them to the ggml-org/llama.cpp repository. This prevents these hosts from attempting to download non-existent CPU assets from the Unsloth repository and falling back to source builds. The reviewer suggested combining the conditional logic for Darwin and CPU-only Linux into a single block using modern Bash syntax to improve code conciseness.

gemini-code-assist · 2026-05-06T06:16:39Z

 if [ "$_HOST_SYSTEM" = "Darwin" ]; then
    _HELPER_RELEASE_REPO="ggml-org/llama.cpp"
+elif [ "$_HOST_SYSTEM" = "Linux" ] \
+        && [ "$_HOST_MACHINE" = "x86_64" ] \
+        && [ "$_LINUX_HAS_GPU" = false ]; then
+    _HELPER_RELEASE_REPO="ggml-org/llama.cpp"
 else
    _HELPER_RELEASE_REPO="unslothai/llama.cpp"
 fi


To improve conciseness and avoid repeating the assignment to _HELPER_RELEASE_REPO, you can combine the conditions for Darwin and CPU-only Linux into a single if block. Using the [[ ... ]] construct is also more modern and readable for complex conditions in Bash.

Suggested change

if [ "$_HOST_SYSTEM" = "Darwin" ]; then

_HELPER_RELEASE_REPO="ggml-org/llama.cpp"

elif [ "$_HOST_SYSTEM" = "Linux" ] \

&& [ "$_HOST_MACHINE" = "x86_64" ] \

&& [ "$_LINUX_HAS_GPU" = false ]; then

_HELPER_RELEASE_REPO="ggml-org/llama.cpp"

else

_HELPER_RELEASE_REPO="unslothai/llama.cpp"

fi

if [[ "$_HOST_SYSTEM" == "Darwin" || ( "$_HOST_SYSTEM" == "Linux" && "$_HOST_MACHINE" == "x86_64" && "$_LINUX_HAS_GPU" == false ) ]]; then

_HELPER_RELEASE_REPO="ggml-org/llama.cpp"

else

_HELPER_RELEASE_REPO="unslothai/llama.cpp"

fi

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 449f84fb57

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-06T06:18:48Z

+# x86_64 routes to ggml-org for bin-ubuntu-x64.tar.gz. Anything with a
+# GPU tool installed stays on unslothai (CUDA bundle / ROCm source build).
+_LINUX_HAS_GPU=false
+for _GPU_TOOL in nvidia-smi rocminfo amd-smi hipconfig hipinfo; do


Include compiler probes in GPU routing

On Linux x86_64 hosts where the CUDA/ROCm compiler is installed but these runtime utilities are not on PATH (for example nvcc under /usr/local/cuda/bin or hipcc under /opt/rocm/bin), this loop leaves _LINUX_HAS_GPU=false and the new branch installs the upstream CPU tarball successfully. That suppresses the existing source-build path that explicitly checks those compiler locations and enables -DGGML_CUDA=ON / -DGGML_HIP=ON later in this same script, so those environments silently lose GPU-enabled llama.cpp instead of building it as before.

Useful? React with 👍 / 👎.

* Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles (app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30 releases looking for a non-existent app-*-linux-x64-cpu asset, exited the prebuilt planner with "no compatible Linux prebuilt asset was found", and fell through to a source build. Free CI runners (ubuntu-latest with no GPU) hit this on every install, and anyone running Studio on a Linux laptop without an NVIDIA GPU paid the ~3 minute cmake+make cost on first install. ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release and install_llama_prebuilt.py already knows how to fetch it: when called with --published-repo ggml-org/llama.cpp, the Linux x86_64 + not has_usable_nvidia branch in direct_upstream_release_plan picks up that asset directly. The fix is purely on the routing side. Tighten the gate so a Linux host routes to ggml-org only when it is x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo, amd-smi, hipconfig, hipinfo). Everything else stays on the current path: - macOS: already on ggml-org, unchanged - Windows: already on ggml-org via setup.ps1, unchanged - Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged - Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present -> unslothai/llama.cpp -> source build with HIP, unchanged - Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new ggml-org route, gets upstream CPU asset (same as today's source-build CPU output, ~3 min faster) - Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp -> source build, unchanged * Tighten routing comment in studio/setup.sh

danielhanchen mentioned this pull request May 6, 2026

Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke #5298

Merged

4 tasks

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

Tighten routing comment in studio/setup.sh

449f84f

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

danielhanchen merged commit 7de1f4c into main May 6, 2026
6 checks passed

danielhanchen deleted the fix/llama-prebuilt-linux-cpu-upstream branch May 6, 2026 06:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts#5302

Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts#5302
danielhanchen merged 2 commits into
mainfrom
fix/llama-prebuilt-linux-cpu-upstream

danielhanchen commented May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if command -v "$_GPU_TOOL" >/dev/null 2>&1; then
		_LINUX_HAS_GPU=true

Uh oh!

Conversation

danielhanchen commented May 6, 2026

Summary

Fix

Routing matrix

Verification

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant