-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOUNTY - $100] Support running any model from huggingface #357
Comments
Huggingface transformers can run gguf files but they first dequantize it to fp32 defeating the purpose altogether. We can directly run this on llama.cpp instead of using the hf/torch inference engine but I'm not quite sure about that yet. PS: #335 is still WIP but can probably base this feature on that, I can work on accelerating the progress as far as llama.cpp inference is concerned. |
@AlexCheema I would like to work on this. Please assign it to me |
hi @AlexCheema, llama.cpp seems to natively support sharding using gguf-split, could we just use that to shard the downloaded gguf and run it on connected nodes? I also feel we will need to do this on llama.cpp considering the huggingface method is to dequantise it which is suboptimal. |
exo supports multiple inference backends through the |
I'm not sure if there is a way to run .gguf files on pytorch. Huggingface can be done but would have to be dequantised. Since there already is a huggingface inference engine I'd base this current feature on that; until llama.cpp inference comes around. How does this sound? |
Sure lets start with that. |
I'm using this library to parse the gguf files, it takes the array of bytes tensors and converts them to numpy arrays. If you intend to load the weights into Pytorch you could just convert the numpy arrays to torch tensors. |
@bayedieng there's also llama.cpp to torch converter. |
MLX has documentation on using gguf files for generation, will integrate this into exo for now. |
Like this: https://x.com/reach_vb/status/1846545312548360319
This should work out of the box with #139
The text was updated successfully, but these errors were encountered: