ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device by fl0rianr · Pull Request #23007 · ggml-org/llama.cpp

fl0rianr · 2026-05-13T09:02:42Z

Overview

Report CUDA/HIP devices as GGML_BACKEND_DEVICE_TYPE_IGPU when the runtime reports
cudaDeviceProp::integrated.

This intentionally checks cudaDeviceProp directly instead of
ggml_cuda_info().devices[id].integrated, since the latter one is temporally disabled
by design for CUDA buffer allocation behavior due to #15034. So this change only affects the automatic backend
device classification and does not re-enable the integrated-buffer path.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:
NO

0cc4m · 2026-05-13T11:14:55Z

It's good to see the IGPU type expand to other backends, we can use it to adapt downstream behaviour according to the device type to e.g. disable mmap on integrated GPUs.

JohannesGaessler

My understanding is that the only change this should currently make in terms of program logic is which ggml backend devices llama.cpp prioritizes.

0cc4m · 2026-05-17T16:43:12Z

Yes, currently this only affects situations where you have both an integrated and one or multiple dedicated GPUs available. In that case it defaults to the dedicated GPU(s), not the integrated one. It can be overridden with --devices.

fl0rianr · 2026-05-17T16:43:52Z

Yes, it only affects if there is something acting on that flag. At the moment nothing besides de-duplication is not done in llama.cpp which is not critical for iGPU IMHO.

)

* origin/master: (32 commits) hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835) mtmd-debug: add color and rainbow mode (ggml-org#23829) mtmd: fix gemma 4 projector pre_norm (ggml-org#23822) opencl: move backend info printing into its own function (ggml-org#23702) ci : run ui publish on ubuntu-slim (ggml-org#23818) ui: fix audio and video modality detection (ggml-org#23756) ci : releases use Github-hosted builds for the UI (ggml-org#23823) app : improve help output (ggml-org#23805) mtmd: n_head_kv defaults to n_head (ggml-org#23782) mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815) ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820) arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167) test-llama-archs: fix table format [no release] (ggml-org#23810) ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729) CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227) server: minor tweaks to use more cpp features (ggml-org#23785) hexagon: minor refresh for HMX FA and MM (ggml-org#23796) vulkan: fast path for walsh-hadamard transform (ggml-org#23687) chat : add Granite 4.1 chat template (ggml-org#23518) ...

After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: #23858

)

…23868) After ggml-org#23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: ggml-org#23858

)

…23868) After ggml-org#23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: ggml-org#23858

ggml: auto apply iGPU flag CUDA/HIP if integrated device

710ec24

fl0rianr requested a review from a team as a code owner May 13, 2026 09:02

fl0rianr mentioned this pull request May 13, 2026

common: improve --fit host-memory accounting for CPU and iGPU #22922

Open

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 13, 2026

fl0rianr changed the title ~~ggml: auto apply iGPU flag for CUDA/HIP if integrated device~~ ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device May 13, 2026

JohannesGaessler approved these changes May 17, 2026

View reviewed changes

am17an approved these changes May 17, 2026

View reviewed changes

mapatel-amd mentioned this pull request May 27, 2026

[Issue]: ROCm unable to load models and fails to fold 26.5.1 ROCm/ROCm#6227

Open

JohannesGaessler merged commit 30af6e2 into ggml-org:master May 28, 2026
48 checks passed

adrianhoehne pushed a commit to adrianhoehne/llama.cpp that referenced this pull request May 28, 2026

ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007

cc553dc

)

Illuminati-CRAZ mentioned this pull request May 29, 2026

Misc. bug: Model load not allocating tensors to Strix Halo host memory when using RPC with another Strix Halo device #23858

Closed

rgerganov mentioned this pull request May 29, 2026

llama : do not skip iGPU when only RPC devices are present #23868

Merged

0cc4m mentioned this pull request May 30, 2026

llama: only use one iGPU device by default #23897

Merged

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007

49e18e0

)

mapatel-amd mentioned this pull request Jun 1, 2026

Misc. bug: ggml-cuda: restore prop.integrated for HIP builds; #16308 hardcode breaks iGPU classification and supports_buft for AMD APUs #23977

Open

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007

17c8890

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device#23007

ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device#23007
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
fl0rianr:fix/auto-apply_iGPU_flag

fl0rianr commented May 13, 2026

Uh oh!

0cc4m commented May 13, 2026

Uh oh!

JohannesGaessler left a comment

Uh oh!

0cc4m commented May 17, 2026

Uh oh!

fl0rianr commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fl0rianr commented May 13, 2026

Overview

Requirements

Uh oh!

0cc4m commented May 13, 2026

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

0cc4m commented May 17, 2026

Uh oh!

fl0rianr commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants