llama: only use one iGPU device by default by 0cc4m · Pull Request #23897 · ggml-org/llama.cpp

0cc4m · 2026-05-30T05:58:38Z

Overview

After #23007 Vulkan is no longer the only backend reporting devices as iGPU, so we now get the case that multiple backends report the same iGPU. On my DGX Spark that leads to the model being split between CUDA and Vulkan.

This is the simplest solution, just only ever allow a single iGPU. I think that there should never be a case with multiple iGPUs, so this is okay. The dGPU deduplication logic by device_id would also work on DGX Spark and (Linux) AMD, but I don't think it is needed here.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES

…wercase * upstream/master: (27 commits) vocab : add tokenizer support for jina-embeddings-v2-base-zh (ggml-org#18756) ui: fix ETag truncation with MSVC compiler (ggml-org#23917) docs : update ZenDNN docs for Q8 support (ggml-org#23791) llama: only use one iGPU device by default (ggml-org#23897) webui: add custom CSS injection via config (ggml-org#23904) Support `-fa auto` in llama-bench (ggml-org#23714) opencl: support bf16 by converting to f16 (ggml-org#23839) ui: exclude generated build dirs from prettier and eslint so lint errors stop being masked (ggml-org#23910) TP: fix granularity for Qwen 3.5/3.6 + 3 GPUs (ggml-org#23843) metal : restore im2col implementation for large kernels (ggml-org#23901) test: (test-llama-archs) log the config name first (ggml-org#23885) ci : update ios-xcode release job to macos-26 (ggml-org#23906) ggml : add some lsx support (ggml-org#23798) vulkan: add Flash Attention support for BFloat16 KV cache (ggml-org#23420) ci : fix s390x release job (ggml-org#23898) ci : clear cache instead of "no timestamp" keys + fix macos (ggml-org#23895) llama : do not skip iGPU when only RPC devices are present (ggml-org#23868) server: in SSE mode, send HTTP headers when slot starts (ggml-org#23884) ggml-webgpu: Check earlier for WebGPU required features (ggml-org#23879) ggml-webgpu: add q4_0/q8_0 SET_ROWS (ggml-org#23760) ... # Conflicts: # gguf-py/gguf/vocab.py # src/llama-vocab.cpp

Djip007 · 2026-06-01T18:27:26Z

Not a good idea, it will break the quad MI300A APU.
https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/data-sheets/amd-instinct-mi300a-data-sheet.pdf

0cc4m · 2026-06-02T06:49:10Z

You can open an issue or propose a solution once you have access to one. Random criticism just according to a datasheet is rather dubious.

Djip007 · 2026-06-03T10:18:13Z

sorry only wanted to help.
And yes, my comment is a bit harsh. (I had a rough day and it has nothing to do with this project.)

I think that there should never be a case with multiple iGPUs, so this is okay.

juste wanted to report that it is not the case... but it is the only I know (for the story it power the faster HPC of the top500)

You can open an issue or propose a solution once you have access to one

I really like to have access on one of them, sure it will be really good. But no I don't.
And Yes If I can have access I will create a issue / and a PR.

On my DGX Spark that leads to the model being split between CUDA and Vulkan.

Just out of curiosity: in what case do you need to activate both backends?

0cc4m · 2026-06-03T11:32:33Z

No worries then. But the idea of an iGPU is "sharing memory with the host", and I have no idea how that would work with multiple GPUs. Might be an interesting edge case eventually. For now this change should be correct.

I run CUDA+Vulkan on my Spark to be able to test both backends without recompiling. They can (usually) coexist without problem, it will prefer CUDA by default for devices that support both.

Djip007 · 2026-06-03T13:11:20Z

Without proper hardware it is alway hard to know what to do. ;)

There is more element here: https://arxiv.org/pdf/2508.11298 . For me look like all GPU can access all RAM the "same" way CPU access other RAM on NUMA nodes (But that's just my understanding.)

One possiblity is to add all devices on the same backend (until we get heterogenous multi iGPU ?)
Like

                        if (igpus.empty()) {
                            igpus.push_back({false, dev});
                        } else {
                            // add only device with the same backend (for MI300A?)
                            ggml_backend_reg_t reg = ggml_backend_dev_backend_reg(dev);
                            ggml_backend_reg_t reg0 = ggml_backend_dev_backend_reg(igpus[0].dev);
                            // ??? can we compare the reg pointer?
                            if (ggml_backend_reg_name(reg) == ggml_backend_reg_name(reg0)) {
                                igpus.push_back({false, dev});
                            }
                        }

But can't test it ...

0cc4m · 2026-06-04T09:02:33Z

We'll look into it once someone actually tries it and reports a problem. You can always manually override the selection anyways.

llama: only use one iGPU device by default

4ac7221

0cc4m requested a review from ggerganov as a code owner May 30, 2026 05:58

ggerganov approved these changes May 30, 2026

View reviewed changes

taronaeo approved these changes May 31, 2026

View reviewed changes

0cc4m merged commit 22cadc1 into master May 31, 2026
27 checks passed

0cc4m deleted the 0cc4m/igpu-deduplication branch May 31, 2026 06:17

turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026

llama: only use one iGPU device by default (ggml-org#23897)

41fd38e

amv mentioned this pull request Jun 4, 2026

Misc. bug: Model layer memory not split to multiple CUDA cards after release b9439 #24147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: only use one iGPU device by default#23897

llama: only use one iGPU device by default#23897
0cc4m merged 1 commit into
masterfrom
0cc4m/igpu-deduplication

0cc4m commented May 30, 2026

Uh oh!

Uh oh!

Djip007 commented Jun 1, 2026 •

edited

Loading

Uh oh!

0cc4m commented Jun 2, 2026

Uh oh!

Djip007 commented Jun 3, 2026

Uh oh!

0cc4m commented Jun 3, 2026

Uh oh!

Djip007 commented Jun 3, 2026

Uh oh!

0cc4m commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

0cc4m commented May 30, 2026

Overview

Requirements

Uh oh!

Uh oh!

Djip007 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Jun 2, 2026

Uh oh!

Djip007 commented Jun 3, 2026

Uh oh!

0cc4m commented Jun 3, 2026

Uh oh!

Djip007 commented Jun 3, 2026

Uh oh!

0cc4m commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Djip007 commented Jun 1, 2026 •

edited

Loading