-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using metal and n_gpu_layers
produces no tokens
#30
Comments
Is llama.cpp actually using Metal? I tried this and noticed (only after enabling some debug logging) that in fact the file |
I copied over the necessary metal files, otherwise I would get an error. After copying the files I encountered the no generated tokens issue. |
AFAIK it does: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#metal-build |
llama-cpp-python requires the user to specify Do users need to do something similar during |
Reading through here, it seems like llama.cpp needs to be built with specific flags in order for metal support to work: ggerganov/llama.cpp#1642 |
I'm running the example script with a few different models:
When not using metal (not using
n_gpu_layers
) the models generate tokens ex:When I use
n_gpu_layers
it does not generate tokens, ex:Is this a known behavior?
The text was updated successfully, but these errors were encountered: