[RFC] ggml: new backend for API Remoting#17072
[RFC] ggml: new backend for API Remoting#17072kpouget wants to merge 3 commits intoggml-org:masterfrom
Conversation
|
Very interesting work, thanks for sharing it! Is it possible to get your PoC running on a Linux host with |
not yet, as MacOS has been the main target so far, but I'm working now on setting up the Linux environment where I can test this setup. The host side relies on For MacOS, the user-friendly instructions are detailed in the blog post, and I can share the steps to build from sources on demand. |
|
I opened the RFC PR on virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1584 and the code now works on Linux (tested with the To reproduce the POC on Linux (with
or try simply it with this command: Note that:
|
|
Closing this PR, I opened a new PR #18718 with the v2 |
Hello, I would like to discuss if this work could be integrated in the
llama.cppcodebase.The API Remoting backend/frontend allow escaping the VM isolation, with the help of the
virt-gpuparavirtualization (and thevirglrendererlibrary on the host side).ggml-remotingfrontendis a GGML API implementation, which intercepts the GGML API calls and forwards them to thevirt-gpuvirtual deviceggml-remotingbackendis library loaded byvirglrenderer(PR will be opened soon for discussion), which opens a GGML library and forwards the call received fromvirglrenderer.The code is currently a POC, I will refine it after the first round of feedback.
ggml-RPC. The overall idea is the same, but the transport layer is virtualization aware, which helps limiting the buffer copies.supports_opmethod is implemented in a hacky way: I've copied theggml-metaldefinition to the frontend library, and I expose the few properties required to compute it from theggml-metalbackend. IIRC, this was only needed for the micro-benchmark to work correctly (theggml-rpcsimply returnstrueto avoid this bottleneck).Here is the context behind this PR: