ggml-cuda: auto apply iGPU flag for CUDA/HIP if integrated device#23007
Conversation
|
It's good to see the IGPU type expand to other backends, we can use it to adapt downstream behaviour according to the device type to e.g. disable mmap on integrated GPUs. |
JohannesGaessler
left a comment
There was a problem hiding this comment.
My understanding is that the only change this should currently make in terms of program logic is which ggml backend devices llama.cpp prioritizes.
|
Yes, currently this only affects situations where you have both an integrated and one or multiple dedicated GPUs available. In that case it defaults to the dedicated GPU(s), not the integrated one. It can be overridden with |
|
Yes, it only affects if there is something acting on that flag. At the moment nothing besides de-duplication is not done in llama.cpp which is not critical for iGPU IMHO. |
* origin/master: (32 commits) hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (ggml-org#23835) mtmd-debug: add color and rainbow mode (ggml-org#23829) mtmd: fix gemma 4 projector pre_norm (ggml-org#23822) opencl: move backend info printing into its own function (ggml-org#23702) ci : run ui publish on ubuntu-slim (ggml-org#23818) ui: fix audio and video modality detection (ggml-org#23756) ci : releases use Github-hosted builds for the UI (ggml-org#23823) app : improve help output (ggml-org#23805) mtmd: n_head_kv defaults to n_head (ggml-org#23782) mtmd: fix gemma 4 audio rms norm eps (ggml-org#23815) ci : change Vulkan builds to Release to reduce ccache (ggml-org#23820) arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (ggml-org#23167) test-llama-archs: fix table format [no release] (ggml-org#23810) ggml: auto apply iGPU flag CUDA/HIP if integrated device (ggml-org#23007) mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (ggml-org#23729) CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (ggml-org#23227) server: minor tweaks to use more cpp features (ggml-org#23785) hexagon: minor refresh for HMX FA and MM (ggml-org#23796) vulkan: fast path for walsh-hadamard transform (ggml-org#23687) chat : add Granite 4.1 chat template (ggml-org#23518) ...
After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: #23858
…23868) After ggml-org#23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: ggml-org#23858
…23868) After ggml-org#23007 reclassified integrated CUDA/HIP devices as IGPU, the device selection logic dropped the local iGPU whenever any RPC server was added, because RPC devices made `model->devices` non-empty. On systems where the "iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified memory), this caused all tensors to be allocated on the RPC peer alone and model loading to fail. Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer suppress the local iGPU. closes: ggml-org#23858
Overview
Report CUDA/HIP devices as
GGML_BACKEND_DEVICE_TYPE_IGPUwhen the runtime reportscudaDeviceProp::integrated.This intentionally checks cudaDeviceProp directly instead of
ggml_cuda_info().devices[id].integrated, since the latter one is temporally disabledby design for CUDA buffer allocation behavior due to #15034. So this change only affects the automatic backend
device classification and does not re-enable the integrated-buffer path.
Requirements
NO