-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda? #2
Comments
sorry for the late reply. Yes I do have the plan to support CUDA actually. But because of my personal issue it might be implemented months later. I would suggest you to use more serious repo if you get a GPU. |
Hey @xyzhang626 do you have any resources/pointers/tips on how CUDA is implemented in ggml? Unless I'm missing something there's basically zero documentation. I've adapted this code to support a slightly different architecture for my needs, but I can't quite figure out how to begin with CUDA. Any help would be appreciated. If I succeed I could also do a PR into this repo. |
Sorry for the late reply @grantbey Yes the lack of document is one of the biggest challenges for people who want to build something based the ggml. It's really annoying. I think the best way (or the only way) to do that is referring to more mature repo built with ggml, e,g, chatglm.cpp |
Thanks @xyzhang626! That's sort of what I've been doing. I'll take a look at the example you gave, hopefully it's easier to follow than the ones I've seen elsewhere. (edit: realised I replied from a different account oops) |
hey @grantbey @grantnebula maybe you should look at this, which forks this repo, optimize code a lot and support CUDA! |
Print only array
Do you have any plans to support the other backends that LlamaCPP supports so that this can be accelerated?
The text was updated successfully, but these errors were encountered: