Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan #15

Merged
merged 10 commits into from
Nov 12, 2024
Merged

Vulkan #15

merged 10 commits into from
Nov 12, 2024

Conversation

AsbjornOlling
Copy link
Contributor

This PR adds the vulkan feature to the llama-cpp-2 depdendency.

It also adds a few vulkan dependencies to the shell.nix environment.

@AsbjornOlling
Copy link
Contributor Author

I can no longer reproduce the build-time issue I had before.. strange. It seems to just work?

I also added the three vulkan dependencies to the default nix derivation's buildInputs, which seems to fix nix compilation as well.
They probably don't all need to be in nativeBuildInputs AND buildInputs, but this seems to work.

@AsbjornOlling
Copy link
Contributor Author

AsbjornOlling commented Nov 5, 2024

ggerganov/llama.cpp#9582
Oh wow...
I hit this issue locally. Luckily it's not an issue in CI.

I will add ulimit -n 2048 to our shell.nix

@AsbjornOlling
Copy link
Contributor Author

Running all tests concurrently crashes in CI as well as on my machine locally. I added --test-threads=1 to fix this for now.

Now for a strange one: the tests run in the CI runner, but I get different token outputs on the CI runner than I do on my machine locally.

@AsbjornOlling
Copy link
Contributor Author

AsbjornOlling commented Nov 7, 2024

Ok so, I set up the nix derivation to also run unit tests (inside the nix build sandbox), using a software renderer from mesa.

It reproduces the same error that we see in github actions. As in, the exact same nonsense tokens.

Run nix build to reproduce the error locally.

@AsbjornOlling
Copy link
Contributor Author

OOOhkay- I think I have a lead. It has to do with over-allocating the "GPU"

It runs fine and passes both tests if I set with_n_gpu_layers(0)

How do we address this? Can we ask the system for how much VRAM is available, and estimate how many layers fit?

@AsbjornOlling
Copy link
Contributor Author

I think that the reason I'm not hitting this locally is just that I have plenty of VRAM for the small models we run on my machine.

@AsbjornOlling
Copy link
Contributor Author

But it's not just a question of trying to load a model that's bigger than the VRAM available.

I tried loading Gemma 2 27B - which is way bigger than the VRAM I have on my laptop - but it just fills my GPU and then offloads the rest to my system RAM. That's using Vulkan w/ my AMD 7700S.

So maybe the error is specifically when using a specific software renderer?

@volesen
Copy link
Contributor

volesen commented Nov 12, 2024

Maybe we should addd the following to cargo.toml and use no features as default.

[target.'cfg(not(target_os = "macos"))'.dependencies]
llama-cpp-2 = { version = "0.1.83", features = ["vulkan"] }

Setting the Vulkan features requires that a VulkanSDK in found for MacOS builds. That is, features = ["vulkan"] is not ignored.

However, I am not sure whether that is blocked by rust-lang/cargo#1197

@AsbjornOlling AsbjornOlling changed the title Draft: vulkan Vulkan Nov 12, 2024
@emilnorsker
Copy link
Contributor

Libraries for the gd.extension should be updated to reflect the new folder structure as well.

Updated:
[libraries]
linux.debug.x86_64 = "res://../nobodywho/target/debug/libnobodywho.so"
linux.release.x86_64 = "res://../nobodywho/target/release/libnobodywho.so"
windows.debug.x86_64 = "res://../nobodywho/target/debug/nobodywho.dll"
windows.release.x86_64 = "res://../nobodywho/target/release/nobodywho.dll"
macos.debug = "res://../nobodywho/target/debug/libnobodywho.dylib"
macos.release = "res://../nobodywho/target/release/libnobodywho.dylib"
macos.debug.arm64 = "res://../nobodywho/target/debug/libnobodywho.dylib"
macos.release.arm64 = "res://../nobodywho/target/release/libnobodywho.dylib"

otherwise cool, lgtm!

@AsbjornOlling AsbjornOlling merged commit 75a2c28 into main Nov 12, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants