Skip to content

Conversation

@traversaro
Copy link
Contributor

@traversaro traversaro commented Sep 16, 2025

This PR enables Vulkan support in all builds. With just an additional ~5 MB more, we can a package that can use GPU also on non-Nvidia GPUs (I tested just on an integrated Intel GPU and the perfomance are not at the level of CUDA, but definitely better then CPU).

For example, on my testing machine Windows laptop with NVIDIA GeForce RTX 4050 running Linux binaries via WSL these are the results:

### CUDA

(llama) traversaro@IITBMP014LW012:~/pixiws/llama$ llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4050 Laptop GPU, compute capability 8.9, VMM: yes
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  99 |           pp512 |     7515.60 ± 189.31 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  99 |           tg128 |        117.15 ± 2.45 |

build: c823a55 (137)

### Vulkan on Discrete Nvidia GPU

(llama-vulkan) traversaro@IITBMP014LW012:~/pixiws/llama/llama.cpp-feedstock$ llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Microsoft Direct3D12 (NVIDIA GeForce RTX 4050 Laptop GPU) (Dozen) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           pp512 |      2820.45 ± 57.78 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           tg128 |         78.16 ± 2.94 |


### Vulkan on Integrated Intel GPU

export GGML_VK_VISIBLE_DEVICES=1
(llama-vulkan) traversaro@IITBMP014LW012:~/pixiws/llama/llama.cpp-feedstock$ llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Microsoft Direct3D12 (Intel(R) Iris(R) Xe Graphics) (Dozen) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 16 | shared memory: 32768 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           pp512 |       752.84 ± 34.93 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           tg128 |         43.39 ± 1.81 |

### CPU 

(llama-vulkan) traversaro@IITBMP014LW012:~/pixiws/llama/llama.cpp-feedstock$ GGML_VK_VISIBLE_DEVICES= llama-bench -m ~/.
cache/llama.cpp/ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: dzn is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 0 Vulkan devices:
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           pp512 |        148.64 ± 5.85 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | Vulkan,BLAS |      10 |           tg128 |          8.81 ± 0.09 |

build: c823a55 (137)

In case users want for use cpu for any reason, they can do that by passing ---device none to llama-cli or llama-server.

Thinks to discuss:

  • With this change, the cpu variant also supports GPU (via Vulkan). So it is ok to keep calling it cpu, or should we use another name?
  • Even if Nvidia GPU are supported via Vulkan, typically using directly the CUDA backend give better performances, so on system with __cuda defined, the cuda package will continue to be installed. For completeness, I also included the Vulkan backend also on CUDA builds, to permit users to easily test them with --device option, as the cost is just ~5 MB of size more. However we can also disable.
  • Similar point for macOS: I enabled the Vulkan backend on macOS, but by default macOS does not contain any Vulkan-capable driver, so the metal version will continue to be used. Perhaps we can disable Vulkan there? Answer in Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs #64 (comment) .

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@traversaro
Copy link
Contributor Author

@conda-forge-admin, please rerender

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Sep 16, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17761278666. Examine the logs at this URL for more detail.

@traversaro
Copy link
Contributor Author

traversaro commented Sep 16, 2025

It seems that we need to understand how to correctly handle cross-compilation for vulkan-shaders-gen (just a nice to have for osx-arm64, but quite important for linux-aarch64).

@traversaro
Copy link
Contributor Author

xref: ggml-org/llama.cpp#10448

@traversaro traversaro marked this pull request as draft September 16, 2025 09:09
@traversaro
Copy link
Contributor Author

The test is failing with:

 │ Testing commands:
 │ llama-cli: error while loading shared libraries: libvulkan.so.1: cannot open shared object file: No such file or directory
 │ × error Script failed with status 127

I am not sure why the package build worked locally.

@traversaro
Copy link
Contributor Author

Beside Nvidia and Intel I wanted also to test on AMD, but unfortunatly the only AMD GPU I have access to is a MI300X, that is not currently supported by AMD Vulkan driver.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Sep 16, 2025

Hi! This is the friendly automated conda-forge-linting service.

I failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint --conda-forge . from the recipe directory. You can also examine the workflow logs for more detail.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17765048761. Examine the logs at this URL for more detail.

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Sep 16, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/recipe.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/recipe.yaml:

  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/17779036826. Examine the logs at this URL for more detail.

@traversaro traversaro changed the title Enable Vulkan support Enable Vulkan support to support non-Nvidia GPUs Sep 16, 2025
@traversaro
Copy link
Contributor Author

osx-arm64 still fails as the LDFLAGS of the cross-compile build conflict with the one of the native build, see:

 │ │ FAILED: [code=1] $SRC_DIR/build/bin/vulkan-shaders-gen 
 │ │ : && $BUILD_PREFIX/bin/x86_64-apple-darwin13.4.0-clang++ -O2 -O3 -DNDEBUG -isysroot /Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk -mmacosx-version-min=11.0 -Wl,-search_paths_first -Wl,-headerpad_max_install_names -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,$PREFIX/lib -L$PREFIX/lib CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o $SRC_DIR/build/bin/vulkan-shaders-gen   && :
 │ │ ld: warning: ignoring file $PREFIX/lib/libc++.dylib, building for macOS-x86_64 but attempting to link with file built for macOS-arm64
 │ │ Undefined symbols for architecture x86_64:

tipically this is solved with a separate build, see for example https://github.com/conda-forge/gz-msgs-feedstock/blob/5275f5d8402b69aeebf5f766620db5b148cc9c88/recipe/build_cxx.sh#L15 .

@traversaro
Copy link
Contributor Author

Alternatively, we can indeed define a toolchain to pass along to the GGML_VULKAN_SHADERS_GEN_TOOLCHAIN option, where we define the CMAKE_SHARED_LINKER_FLAGS_INIT variable to ensure that LDFLAGS is ignored.

@traversaro
Copy link
Contributor Author

traversaro commented Sep 16, 2025

Alternatively, we can indeed define a toolchain to pass along to the GGML_VULKAN_SHADERS_GEN_TOOLCHAIN option, where we define the CMAKE_SHARED_LINKER_FLAGS_INIT variable to ensure that LDFLAGS is ignored.

Similar point for macOS: I enabled the Vulkan backend on macOS, but by default macOS does not contain any Vulkan-capable driver, so the metal version will continue to be used. Perhaps we can disable Vulkan there?

I ended up disabling vulkan on macOS as I doubt it would be useful and it requires a complex workarounds to ensure that it works fine.

@traversaro traversaro changed the title Enable Vulkan support to support non-Nvidia GPUs Enable Vulkan support on Linux and Windows to support non-Nvidia GPUs Sep 16, 2025
@traversaro traversaro marked this pull request as ready for review September 16, 2025 21:07
@traversaro
Copy link
Contributor Author

@conda-forge/llama.cpp the PR is ready for discussion and review, thanks!

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

While supporting Vulkan is valuable, I think I would slightly be in favor of having dedicated build variants for it. Ideally, those build variants would be picked after hardware detection (e.g. like with the __cuda dependency), but I guess a CEP is necessary for this.

Also could you become a maintainer of this feedstock?

Comment on lines +41 to +45
# Vulkan is only useful on Linux, as on macOS
# metal is used
{%- if linux %}
${{ ggml_args("VULKAN=ON") }}
{%- endif %}
Copy link
Member

@jjerphan jjerphan Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me.

@traversaro
Copy link
Contributor Author

Also could you become a maintainer of this feedstock?

Sure!

@jjerphan
Copy link
Member

Let me have you approve #66 (comment).

@traversaro
Copy link
Contributor Author

While supporting Vulkan is valuable, I think I would slightly be in favor of having dedicated build variants for it. Ideally, those build variants would be picked after hardware detection (e.g. like with the __cuda dependency), but I guess a CEP is necessary for this.

Ack! Indeed, some kind of vulkan-based hardware detection would be cool, even if a bit longer term (interesting, even on the WheelNext variant providers side I could not see anything related to Vulkan) and even if in general virtual packages and the corresponding variants have the downside of making more complex dealing with lock files.

Just to understand, what is the downside for which you would prefer to have a different vulkan variant? Just the increase in size of the package, or something else?

@jjerphan
Copy link
Member

The size of packages could continue to grow overtime if we continue to optimize builds.

I am fine with this approach since elements for packaging this variant are missing, but I would like to have the other maintainers' perspective before approving.

@traversaro
Copy link
Contributor Author

The size of packages could continue to grow overtime if we continue to optimize builds.

I totally agree on this, that is a common concern in binary-based package distributions like conda-forge or Linux distro. From source distro typically mitigate the problem by having optional flags to enable/disable feature, but also in that case a lot of optional flags can really complexify caching artifacts.

My impression is that given there is a tradeoff between "increase in size" and "convenient", the call needs to be done on specific cases, while it is difficult to take a decision in general (or imagine what will happen in the future, that is difficult to forecast). In this case the increase of size seems to me to justify the increase usefulness, but I totally understand if the opinion of other maintainers may be different.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure if the other maintainers are going to reply.

@jjerphan
Copy link
Member

How about introducing a vulkan variant next to the already existing cpu and cuda variants?

@traversaro
Copy link
Contributor Author

How about introducing a vulkan variant next to the already existing cpu and cuda variants?

If you prefer, I can do that. The question is: without any virtual package to select if vulkan is available, which variant should have the higher priority? cpu or vulkan?

@traversaro
Copy link
Contributor Author

My personal impression/assumption is that most users actually have a working vulkan GPU, and so we would make them a good service in making the vulkan variant available by default, while users without a GPU can manually specify the cpu variant.

@jjerphan
Copy link
Member

I would not weight variants for now.

I do not have time to garden this feedstock. Feel free to implement the changes.

@traversaro
Copy link
Contributor Author

I would not weight variants for now.

Why? If we do not give a priority to a variant or another, it is quite difficult in my experience to understant which variants gets installed by default.

I do not have time to garden this feedstock. Feel free to implement the changes.

Ack, I will wait anyhow to see if anyone is interested in this as anyhow I am not in a hurry.

@jjerphan
Copy link
Member

jjerphan commented Nov 4, 2025

@traversaro: I do not think that other maintainers will reply.

Feel free to rebase and merge this PR.

@traversaro
Copy link
Contributor Author

@traversaro: I do not think that other maintainers will reply.

Yes, I sorted of figured that out. : )

However, I think I have way to address most comments/concerns in this PR, I need to update this PR with those.

@traversaro
Copy link
Contributor Author

Closing until I update the PR for clarity.

@traversaro traversaro closed this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants