Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs #64

traversaro · 2025-09-16T08:46:46Z

This PR enables Vulkan support in all builds. With just an additional ~5 MB more, we can a package that can use GPU also on non-Nvidia GPUs (I tested just on an integrated Intel GPU and the perfomance are not at the level of CUDA, but definitely better then CPU).

For example, on my testing machine Windows laptop with NVIDIA GeForce RTX 4050 running Linux binaries via WSL these are the results:

### CUDA

(llama) traversaro@IITBMP014LW012:~/pixiws/llama$ llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4050 Laptop GPU, compute capability 8.9, VMM: yes
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  99 |           pp512 |     7515.60 ± 189.31 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  99 |           tg128 |        117.15 ± 2.45 |

build: c823a55 (137)

### Vulkan on Discrete Nvidia GPU

(llama-vulkan) traversaro@IITBMP014LW012:~/pixiws/llama/llama.cpp-feedstock$ llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Microsoft Direct3D12 (NVIDIA GeForce RTX 4050 Laptop GPU) (Dozen) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           pp512 |      2820.45 ± 57.78 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           tg128 |         78.16 ± 2.94 |


### Vulkan on Integrated Intel GPU

export GGML_VK_VISIBLE_DEVICES=1
(llama-vulkan) traversaro@IITBMP014LW012:~/pixiws/llama/llama.cpp-feedstock$ llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Microsoft Direct3D12 (Intel(R) Iris(R) Xe Graphics) (Dozen) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 16 | shared memory: 32768 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           pp512 |       752.84 ± 34.93 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           tg128 |         43.39 ± 1.81 |

### CPU 

(llama-vulkan) traversaro@IITBMP014LW012:~/pixiws/llama/llama.cpp-feedstock$ GGML_VK_VISIBLE_DEVICES= llama-bench -m ~/.
cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 0 Vulkan devices:
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           pp512 |        148.64 ± 5.85 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           tg128 |          8.81 ± 0.09 |

build: c823a55 (137)

In case users want for use cpu for any reason, they can do that by passing ---device none to llama-cli or llama-server.

Thinks to discuss:

With this change, the cpu variant also supports GPU (via Vulkan). So it is ok to keep calling it cpu, or should we use another name?
Even if Nvidia GPU are supported via Vulkan, typically using directly the CUDA backend give better performances, so on system with __cuda defined, the cuda package will continue to be installed. For completeness, I also included the Vulkan backend also on CUDA builds, to permit users to easily test them with --device option, as the cost is just ~5 MB of size more. However we can also disable.
Similar point for macOS: I enabled the Vulkan backend on macOS, but by default macOS does not contain any Vulkan-capable driver, so the metal version will continue to be used. Perhaps we can disable Vulkan there? Answer in Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs #64 (comment) .

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

traversaro · 2025-09-16T08:46:59Z

@conda-forge-admin, please rerender

conda-forge-admin · 2025-09-16T08:48:15Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17761278666. Examine the logs at this URL for more detail.}

traversaro · 2025-09-16T08:58:01Z

It seems that we need to understand how to correctly handle cross-compilation for vulkan-shaders-gen (just a nice to have for osx-arm64, but quite important for linux-aarch64).

traversaro · 2025-09-16T09:04:01Z

xref: ggml-org/llama.cpp#10448

traversaro · 2025-09-16T09:10:37Z

The test is failing with:

 │ Testing commands:
 │ llama-cli: error while loading shared libraries: libvulkan.so.1: cannot open shared object file: No such file or directory
 │ × error Script failed with status 127

I am not sure why the package build worked locally.

traversaro · 2025-09-16T11:44:03Z

Beside Nvidia and Intel I wanted also to test on AMD, but unfortunatly the only AMD GPU I have access to is a MI300X, that is not currently supported by AMD Vulkan driver.

conda-forge-admin · 2025-09-16T11:59:00Z

Hi! This is the friendly automated conda-forge-linting service.

I failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint --conda-forge . from the recipe directory. You can also examine the workflow logs for more detail.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17765048761. Examine the logs at this URL for more detail.}

conda-forge-admin · 2025-09-16T12:28:49Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17779036826. Examine the logs at this URL for more detail.}

traversaro · 2025-09-16T12:54:04Z

osx-arm64 still fails as the LDFLAGS of the cross-compile build conflict with the one of the native build, see:

 │ │ FAILED: [code=1] $SRC_DIR/build/bin/vulkan-shaders-gen 
 │ │ : && $BUILD_PREFIX/bin/x86_64-apple-darwin13.4.0-clang++ -O2 -O3 -DNDEBUG -isysroot /Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk -mmacosx-version-min=11.0 -Wl,-search_paths_first -Wl,-headerpad_max_install_names -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,$PREFIX/lib -L$PREFIX/lib CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o $SRC_DIR/build/bin/vulkan-shaders-gen   && :
 │ │ ld: warning: ignoring file $PREFIX/lib/libc++.dylib, building for macOS-x86_64 but attempting to link with file built for macOS-arm64
 │ │ Undefined symbols for architecture x86_64:

tipically this is solved with a separate build, see for example https://github.com/conda-forge/gz-msgs-feedstock/blob/5275f5d8402b69aeebf5f766620db5b148cc9c88/recipe/build_cxx.sh#L15 .

traversaro · 2025-09-16T12:56:32Z

Alternatively, we can indeed define a toolchain to pass along to the GGML_VULKAN_SHADERS_GEN_TOOLCHAIN option, where we define the CMAKE_SHARED_LINKER_FLAGS_INIT variable to ensure that LDFLAGS is ignored.

traversaro · 2025-09-16T13:43:32Z

Alternatively, we can indeed define a toolchain to pass along to the GGML_VULKAN_SHADERS_GEN_TOOLCHAIN option, where we define the CMAKE_SHARED_LINKER_FLAGS_INIT variable to ensure that LDFLAGS is ignored.

Similar point for macOS: I enabled the Vulkan backend on macOS, but by default macOS does not contain any Vulkan-capable driver, so the metal version will continue to be used. Perhaps we can disable Vulkan there?

I ended up disabling vulkan on macOS as I doubt it would be useful and it requires a complex workarounds to ensure that it works fine.

traversaro · 2025-09-16T22:41:43Z

@conda-forge/llama.cpp the PR is ready for discussion and review, thanks!

jjerphan

Thank you for the contribution!

While supporting Vulkan is valuable, I think I would slightly be in favor of having dedicated build variants for it. Ideally, those build variants would be picked after hardware detection (e.g. like with the __cuda dependency), but I guess a CEP is necessary for this.

Also could you become a maintainer of this feedstock?

jjerphan · 2025-09-17T08:24:08Z

recipe/recipe.yaml

+        # Vulkan is only useful on Linux, as on macOS
+        # metal is used
+        {%- if linux %}
+        ${{ ggml_args("VULKAN=ON") }}
+        {%- endif %}


LGTM to me.

traversaro · 2025-09-17T08:54:57Z

Also could you become a maintainer of this feedstock?

Sure!

jjerphan · 2025-09-17T09:11:07Z

Let me have you approve #66 (comment).

traversaro · 2025-09-17T09:19:06Z

While supporting Vulkan is valuable, I think I would slightly be in favor of having dedicated build variants for it. Ideally, those build variants would be picked after hardware detection (e.g. like with the __cuda dependency), but I guess a CEP is necessary for this.

Ack! Indeed, some kind of vulkan-based hardware detection would be cool, even if a bit longer term (interesting, even on the WheelNext variant providers side I could not see anything related to Vulkan) and even if in general virtual packages and the corresponding variants have the downside of making more complex dealing with lock files.

Just to understand, what is the downside for which you would prefer to have a different vulkan variant? Just the increase in size of the package, or something else?

jjerphan · 2025-09-17T09:35:47Z

The size of packages could continue to grow overtime if we continue to optimize builds.

I am fine with this approach since elements for packaging this variant are missing, but I would like to have the other maintainers' perspective before approving.

traversaro · 2025-09-17T09:55:02Z

The size of packages could continue to grow overtime if we continue to optimize builds.

I totally agree on this, that is a common concern in binary-based package distributions like conda-forge or Linux distro. From source distro typically mitigate the problem by having optional flags to enable/disable feature, but also in that case a lot of optional flags can really complexify caching artifacts.

My impression is that given there is a tradeoff between "increase in size" and "convenient", the call needs to be done on specific cases, while it is difficult to take a decision in general (or imagine what will happen in the future, that is difficult to forecast). In this case the increase of size seems to me to justify the increase usefulness, but I totally understand if the opinion of other maintainers may be different.

jjerphan

Note sure if the other maintainers are going to reply.

jjerphan · 2025-09-25T12:52:58Z

How about introducing a vulkan variant next to the already existing cpu and cuda variants?

traversaro · 2025-09-25T13:17:58Z

How about introducing a vulkan variant next to the already existing cpu and cuda variants?

If you prefer, I can do that. The question is: without any virtual package to select if vulkan is available, which variant should have the higher priority? cpu or vulkan?

traversaro · 2025-09-25T13:30:55Z

My personal impression/assumption is that most users actually have a working vulkan GPU, and so we would make them a good service in making the vulkan variant available by default, while users without a GPU can manually specify the cpu variant.

jjerphan · 2025-09-25T13:45:29Z

I would not weight variants for now.

I do not have time to garden this feedstock. Feel free to implement the changes.

traversaro · 2025-09-25T13:49:22Z

I would not weight variants for now.

Why? If we do not give a priority to a variant or another, it is quite difficult in my experience to understant which variants gets installed by default.

I do not have time to garden this feedstock. Feel free to implement the changes.

Ack, I will wait anyhow to see if anyone is interested in this as anyhow I am not in a hurry.

jjerphan · 2025-11-04T14:56:40Z

@traversaro: I do not think that other maintainers will reply.

Feel free to rebase and merge this PR.

traversaro · 2025-11-04T15:18:25Z

@traversaro: I do not think that other maintainers will reply.

Yes, I sorted of figured that out. : )

However, I think I have way to address most comments/concerns in this PR, I need to update this PR with those.

traversaro · 2025-11-04T15:20:19Z

Closing until I update the PR for clarity.

traversaro requested review from frankier, jjerphan, jonashaag, pavelzw and sodre as code owners September 16, 2025 08:46

traversaro marked this pull request as draft September 16, 2025 09:09

traversaro mentioned this pull request Sep 16, 2025

Add run_exports conda-forge/vulkan-loader-feedstock#23

Merged

5 tasks

traversaro force-pushed the add-vulkan branch from 70c1d7d to 9592fa2 Compare September 16, 2025 12:27

traversaro changed the title ~~Enable Vulkan support~~ Enable Vulkan support to support non-Nvidia GPUs Sep 16, 2025

traversaro force-pushed the add-vulkan branch from 895bea0 to 493a6a0 Compare September 16, 2025 13:42

traversaro changed the title ~~Enable Vulkan support to support non-Nvidia GPUs~~ Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs Sep 16, 2025

traversaro force-pushed the add-vulkan branch from 10e97bb to 3f655e5 Compare September 16, 2025 14:13

Enable Vulkan support

3bfbf82

traversaro force-pushed the add-vulkan branch from 499cd10 to 3bfbf82 Compare September 16, 2025 21:07

traversaro marked this pull request as ready for review September 16, 2025 21:07

jjerphan reviewed Sep 17, 2025

View reviewed changes

jjerphan mentioned this pull request Sep 17, 2025

@conda-forge-admin, please add user @traversaro #65

Closed

jjerphan approved these changes Sep 18, 2025

View reviewed changes

traversaro mentioned this pull request Sep 19, 2025

gpu/cpu mutex naming conda-forge/conda-forge.github.io#1059

Open

traversaro closed this Nov 4, 2025

Uh oh!

Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs #64

Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs #64

Uh oh!

Conversation

traversaro commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

conda-forge-admin commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversaro commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

conda-forge-admin commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

conda-forge-admin commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

traversaro commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

traversaro commented Sep 16, 2025

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

traversaro commented Sep 17, 2025

Uh oh!

jjerphan commented Sep 17, 2025

Uh oh!

traversaro commented Sep 17, 2025

Uh oh!

jjerphan commented Sep 17, 2025

Uh oh!

traversaro commented Sep 17, 2025

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Sep 25, 2025

Uh oh!

traversaro commented Sep 25, 2025

Uh oh!

traversaro commented Sep 25, 2025

Uh oh!

jjerphan commented Sep 25, 2025

Uh oh!

traversaro commented Sep 25, 2025

Uh oh!

jjerphan commented Nov 4, 2025

Uh oh!

traversaro commented Nov 4, 2025

Uh oh!

traversaro commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

traversaro commented Sep 16, 2025 •

edited

Loading

conda-forge-admin commented Sep 16, 2025 •

edited

Loading

traversaro commented Sep 16, 2025 •

edited

Loading

conda-forge-admin commented Sep 16, 2025 •

edited

Loading

conda-forge-admin commented Sep 16, 2025 •

edited

Loading

traversaro commented Sep 16, 2025 •

edited

Loading

jjerphan Sep 17, 2025 •

edited

Loading