Skip to content

ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device#23007

Merged
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
fl0rianr:fix/auto-apply_iGPU_flag
May 28, 2026
Merged

ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device#23007
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
fl0rianr:fix/auto-apply_iGPU_flag

Conversation

@fl0rianr
Copy link
Copy Markdown
Contributor

Overview

Report CUDA/HIP devices as GGML_BACKEND_DEVICE_TYPE_IGPU when the runtime reports
cudaDeviceProp::integrated.

This intentionally checks cudaDeviceProp directly instead of
ggml_cuda_info().devices[id].integrated, since the latter one is temporally disabled
by design for CUDA buffer allocation behavior due to #15034. So this change only affects the automatic backend
device classification and does not re-enable the integrated-buffer path.

Requirements

@fl0rianr fl0rianr requested a review from a team as a code owner May 13, 2026 09:02
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 13, 2026
@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented May 13, 2026

It's good to see the IGPU type expand to other backends, we can use it to adapt downstream behaviour according to the device type to e.g. disable mmap on integrated GPUs.

@fl0rianr fl0rianr changed the title ggml: auto apply iGPU flag for CUDA/HIP if integrated device ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device May 13, 2026
Copy link
Copy Markdown
Contributor

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the only change this should currently make in terms of program logic is which ggml backend devices llama.cpp prioritizes.

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented May 17, 2026

Yes, currently this only affects situations where you have both an integrated and one or multiple dedicated GPUs available. In that case it defaults to the dedicated GPU(s), not the integrated one. It can be overridden with --devices.

@fl0rianr
Copy link
Copy Markdown
Contributor Author

Yes, it only affects if there is something acting on that flag. At the moment nothing besides de-duplication is not done in llama.cpp which is not critical for iGPU IMHO.

@JohannesGaessler JohannesGaessler merged commit 30af6e2 into ggml-org:master May 28, 2026
48 checks passed
adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 28, 2026
* origin/master: (32 commits)
hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835)
mtmd-debug: add color and rainbow mode (ggml-org#23829)
mtmd: fix gemma 4 projector pre_norm (ggml-org#23822)
opencl: move backend info printing into its own function (ggml-org#23702)
ci : run ui publish on ubuntu-slim (ggml-org#23818)
ui: fix audio and video modality detection (ggml-org#23756)
ci : releases use Github-hosted builds for the UI (ggml-org#23823)
app : improve help output (ggml-org#23805)
mtmd: n_head_kv defaults to n_head (ggml-org#23782)
mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815)
ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820)
arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167)
test-llama-archs: fix table format [no release] (ggml-org#23810)
ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007)
mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729)
CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227)
server: minor tweaks to use more cpp features (ggml-org#23785)
hexagon: minor refresh for HMX FA and MM (ggml-org#23796)
vulkan: fast path for walsh-hadamard transform (ggml-org#23687)
chat : add Granite 4.1 chat template (ggml-org#23518)
...
rgerganov added a commit that referenced this pull request May 30, 2026
After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made `model->devices` non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.

Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer
suppress the local iGPU.

closes: #23858
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
…23868)

After ggml-org#23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made `model->devices` non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.

Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer
suppress the local iGPU.

closes: ggml-org#23858
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
…23868)

After ggml-org#23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made `model->devices` non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.

Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer
suppress the local iGPU.

closes: ggml-org#23858
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants