Binding to llama.cpp #126
Replies: 9 comments 1 reply
-
Maybe I misunderstood what you were saying originally.
I absolutely do think a binding should exist so people cane use it if they want to. More options are always good. The thing I didn't agree was with the "struggling to keep up, always second best, will be used only reluctantly" part of your issue. I also wouldn't like to see llama-rs change direction to only become a thin shell around llama.cpp.
Like you said, llama.cpp and GGML are messy C++ libraries. Rust users usually have a different philosophy of trying for correct behavior, safety, etc. The more of a project that lives in that messy C code, the less the Rust philosophy can be applied. People who are interested in ML/LLMs can currently learn something about that by looking at the current code since it's setting up the structure of the model, defining the ops that occur, etc. Even though the heavy duty math is happening in GGML, there's still a lot that can be learned looking in the llama-rs code. If it was just a front end for llama.cpp, then all you'd find is details about how to interface with that library. That's a whole lot less interesting.
I certainly hope so! There are already a number of Rust projects in progress attempting to do something similar to GGML. |
Beta Was this translation helpful? Give feedback.
-
I share everything @KerfuffleV2 said. Even is llama.cpp is just little code compared to ggml, it's where all the interesting stuff happens. Since the start (when this was a pretty literal port) this project has been gradually moving away from llama.cpp in several ways. I think most of the changes have been steps in the right direction. The llama.cpp ecosystem moves fast, but it's also sloppy. People are forking left and right instead of trying to evolve a cohesive library and follow good practices. You also originally mentioned binding llama.cpp.directly would free development effort for other kinds of improvements. But may I ask, what improvements exactly? Anything that is a substantial improvement would require contributing to the C++ codebase directly so it would not be a Rust project anymore. Things like #14 would not be possible in this development model. If this is just a novelty thing and nobody cares about llama a few months from now, then we all would've had fun. I don't mind who has more or less features. But if a few months from now it turns out this "inference at the edge using ggml" trend is still going, then people are going to want to build projects on top of it, and many are going to appreciate being able to do it in Rust, where building a robust application on top of llama will be just a |
Beta Was this translation helpful? Give feedback.
-
as a nodejs developer, i d like to see more options here. i m willing to provide both solution shipped like an adapters for llama rs and llama sys. what i noticed is that an issue said we probably get a safetensor as alternative to ggml. would that be happening in the future? |
Beta Was this translation helpful? Give feedback.
-
i think the problem is that we need to grab more attentions from the open source community. now people all know llama.cpp, but only part of them know llama rs. |
Beta Was this translation helpful? Give feedback.
-
SafeTensors is just a file format for storing tensors. Loading tensors or settings like hyperparameters, vocab, whatever is generally not very difficult. So you could pretty easily convert existing GGML models to SafeTensors format and possibly vice versa. The complicated thing would be the format the actual tensors are actually stored in. For example, GGML has its own implementation of quantization so anything working with those tensors would have to be able to deal with that format. So just to be clear in the summary: SafeTensors isn't anything more than a storage format for tensors, it doesn't have anything to do with actually running the model. GGML is both a storage format and a library for actually using them.
There are also just generally less Rust developers. A big factor in who can participate will be whether they know the language the respective application is written in. C++ developers aren't really going to be contributing to Rust projects much, and vice versa. |
Beta Was this translation helpful? Give feedback.
-
I think my feeling is best understood by a comparison to opencv. This is extremely excellent project: https://github.com/twistedfall/opencv-rust I use it a lot. It's not a pure rust project, and the various attempts at a pure rust computer vision stack, while interesting, are, basically immature. Re-implementing all of that in rust instead of building applications using it is something I (and, it seems, most other people) are not really interested in, because the marginal benefit of the 'pure rust' stack, is out-weighed significantly by the effort required to build an equivalent library. I hear the other arguments in this thread, and totally fair points. I'm just saying; that's my take. Solving the 'can do inference' problem and then moving on to 'and then do interesting things with it as a dependency' is more interesting to me personally. |
Beta Was this translation helpful? Give feedback.
-
FYI: I just saw
|
Beta Was this translation helpful? Give feedback.
-
That's 100% valid. Different people have different goals and preferences. For me, interacting with this project is mostly to learn about how the LLM works internally. Also, I don't really like the idea of being limited to what the interface supports. With llama-rs, I actually have the capability to dig into the code and add functionality that didn't already exists. When interfacing with a wrapper to llama.cpp itself, that ability is going to be much more limited (I have no interest in learning/writing C++). Of course, I also want to enable interesting things just with the results as well but that might not be possible if I run into a limitation that's out of my hands to deal with. |
Beta Was this translation helpful? Give feedback.
-
Apologies for the late reply to this - I've been enjoying my long weekend 😅 I wrote up the project's thoughts on this in the README, which I'll quote here:
but speaking more personally:
I'm not opposed to binding If we were to bind it, I suspect the interface would be different enough (especially around |
Beta Was this translation helpful? Give feedback.
-
Forking off of #124 as a discussion instead.
tldr; Is it useful to have a binding of llama.cpp?
People seem to have mixed feelings.
My personal opinion is "be pragmatic". The llama.cpp library and ggml are both messy C++ libraries which are changing relatively quickly; given that ggml changes are solved upstream in llama.cpp very quickly, what is the benefit of binding to ggml directly? Is there a future in which it is replaced by a rust implementation for a 'full stack rust' solution?
I dunno. At any rate, right now there seem to be a few things which are lagging here, in terms of implementation.
Existing work:
example binding cxx: https://github.com/iacore/llama-sys
example binding cmake: https://github.com/shadowmint/llama-sys/
example higher level api: https://github.com/shadowmint/llama-rs/
example higher level api (? WIP?): https://github.com/iacore/llama-rs
^ These are all pretty trivial < 1 day worth of effort to setup; bringing some or all of it into this crate might help?
...but, I'm not going to die on a hill about it. If people prefer not to, there's no specific reason to bind it into this crate. I just thought it would be nice to have everything in one central place.
Beta Was this translation helpful? Give feedback.
All reactions