Roadmap June 2023 #1729
Replies: 8 comments 12 replies
-
Are there Metal-like zero copy mechanisms in either of these frameworks? It seems like a necessity for IGPs.
Maybe the recent MeZO (forward pass only) training paper is relevant to this effort? https://github.com/princeton-nlp/MeZO |
Beta Was this translation helpful? Give feedback.
-
@niklaskorz i saw your comment here - any thoughts on the task "Add GPU backend prototypes following the Metal example" with Vulkan / WebGPU? |
Beta Was this translation helpful? Give feedback.
-
For llama_state, is it safe to say that all the states touched in What would be come of Context: I plan to put this behind a grpc service so per-client state is needed. Currently, state-switching is done via Edit: Actually there are also those metric variables like |
Beta Was this translation helpful? Give feedback.
-
do we consider to switch ggml.c to ggml.cpp so that we can leverage template instead of macro to simplify the code? |
Beta Was this translation helpful? Give feedback.
-
Hey could batch Inference be a task to add for next month maybe? I think it would really help with using this at scale. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/ziwang-com/AGM/issues/155 |
Beta Was this translation helpful? Give feedback.
-
Hey @ggerganov, I'm having trouble finding references to text to speech in the new roadmap project. Is that still in the plan? |
Beta Was this translation helpful? Give feedback.
-
The latest update was @PABannier implementing Meta's Encodec codec with |
Beta Was this translation helpful? Give feedback.
-
New roadmap format as Github project: https://github.com/users/ggerganov/projects/7
Outdated below
Previous: Roadmap May 2023
News
The
ggml
project has been funded:Tasks
Refactoring pass
Didn't get to this in May - should do this in June
"There is a lot of code duplication in
ggml.c
which probably can be simplified with a good set of macros. The goal is to keep the code size manageable, while we avoid reaching "macro hell""Integrate recent efforts for training
Amazing work by @xaedes continues to impress: Train Text from scratch #1652
Ultimately, with the ability to train mini models, I am interested in making a small prototype of the following idea for faster inference: Combine large LLM with small LLM for faster inference #630 (comment)
Integrate recent efforts in improving the threading of
ggml
Some very good points and analysis in Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage #1632
Will look into integrating most of the stuff into
ggml
to try and improve the CPU performance furtherExtend Metal shaders to support other quantizations + optimize performance
Currently, the Metal implementation supports just
Q4_0
andF16
. Also, the existing implementation is probably far from optimal. More info: llama : Metal inference #1642Very good field for contributions
Implement inference of new models
There are already some very interesting models that should be supported by
ggml
:Segment Anything Model (SAM)
Still working on the Encoder - progress is a bit slow due to several new operators involved, but I think it is slowly working out: examples : add sample SAM inference ggml#74
Falcon
Bark (text-to-speech)
Advance the community effort for unified
ggml
model formatThis work has been recently initiated and aims to provide a future-proof file format for
ggml
models:ggml : unified file format ggml#220
Add
llama_state
See past Roadmaps - have been postponing this for quite some time. See Roadmap May 2023 #1220 (reply in thread) if interested in giving it a try
Add GPU backend prototypes following the Metal example
For example, it would be interesting if we can add a WebGPU or Vulkan backends in a similar way as we did with Metal. I'm completely unfamiliar with the details of these frameworks, but I'm hoping that people might be interested in giving it a try
Beta Was this translation helpful? Give feedback.
All reactions