Roadmap June 2023 #1729

ggerganov · 2023-06-07T04:13:16Z

ggerganov
Jun 7, 2023
Maintainer

AlphaAtlas · 2023-06-07T07:17:12Z

AlphaAtlas
Jun 7, 2023

For example, it would be interesting if we can add a WebGPU or Vulkan backends in a similar way as we did with metal.

Are there Metal-like zero copy mechanisms in either of these frameworks? It seems like a necessity for IGPs.

Integrate recent efforts for training

Maybe the recent MeZO (forward pass only) training paper is relevant to this effort? https://github.com/princeton-nlp/MeZO

0 replies

sandorkonya · 2023-06-07T07:56:19Z

sandorkonya
Jun 7, 2023

@niklaskorz i saw your comment here - any thoughts on the task "Add GPU backend prototypes following the Metal example" with Vulkan / WebGPU?

0 replies

bullno1 · 2023-06-08T07:22:23Z

bullno1
Jun 8, 2023

For llama_state, is it safe to say that all the states touched in llama_copy_state_data should be isolated into a new struct?

What would be come of llama_copy_state_data and llama_set_state_data then? llama_serialize_state and llama_deserialize_state?

Context: I plan to put this behind a grpc service so per-client state is needed. Currently, state-switching is done via llama_set_state_data.

Edit: Actually there are also those metric variables like t_sample_us I guess those should be moved to llama_state too?

1 reply

bullno1 Jun 12, 2023

Ok, so I have started refactoring into llama_state.

I think I will leave metrics inside llama_context.

They are mostly informational and has no bearings on the output.

I don't want to duplicate all the sampling functions.
They are basically context-free and currently, the context is only used to record metrics.

The duplication is not that dramatic: https://github.com/bullno1/llama.cpp/blob/llama_state/llama.h

howard0su · 2023-06-09T01:01:48Z

howard0su
Jun 9, 2023
Collaborator

do we consider to switch ggml.c to ggml.cpp so that we can leverage template instead of macro to simplify the code?

7 replies

nightscape Jun 22, 2023

@philpax would switching to C++ break the current approach of https://github.com/rustformers/llm/ ?

philpax Jun 22, 2023

It wouldn't break it (especially if the ABI remains C), but adding C++ means that a C++ toolchain is required, which increases the number of build dependencies. We personally encountered this with our use of a bindings generator which required C++ and caused problems for some Fedora users.

I'd suggest keeping it in C and offering a C++ wrapper; that's what most libraries in this situation do when they want a C++-friendly interface.

wtarreau Jun 24, 2023

@hfassold

I would strongly advocate to go to C++, if only for the templates ...

On the opposite, C++ hinders contributions. Virtually every developer can understand and modify C as everything is explicit, there's no magic; but much less are able to even just parse C++ which is cryptic by nature. I would instead advocate for dropping the few bits of C++ from llama.cpp to make it a more portable and more accessible full-C project!

clort81 Jul 16, 2023

I would strongly advise to not

DeveloperPaul123 Sep 16, 2023

Switching to C++ would be great in my opinion. I don't do much work with C so switching to C++ would actually allow me to contribute. Right now, it's a bit more effort for me to reason about the code though I can't say I've spent too much time with it.

nivibilla · 2023-06-18T21:38:49Z

nivibilla
Jun 18, 2023

Hey could batch Inference be a task to add for next month maybe? I think it would really help with using this at scale.

3 replies

ggerganov Jun 19, 2023
Maintainer Author

Batch inference has been demonstrated as part of #1360:

llama.cpp/examples/baby-llama/baby-llama.cpp

Lines 768 to 777 in 16b9cd1

    
           struct ggml_tensor * forward_batch( 
        
                   struct llama_model    * model, 
        
                   struct llama_kv_cache * cache, 
        
                   struct ggml_context   * ctx0, 
        
                   struct ggml_cgraph    * gf, 
        
                   struct ggml_tensor    * tokens_input, 
        
                   const  int              n_tokens, 
        
                   const  int              n_past, 
        
                   const  int              n_batch) {

Although I haven't played with the code in too much detail, so not 100% sure how feasible for general application it is yet.
If you give it a try, definitely let us know your experience

nivibilla Jun 19, 2023

Ah I see. Thanks !

nivibilla Jun 19, 2023

Btw, I don't know any cpp. So I'm as good as llama coding itself lol

ziwang-com · 2023-06-20T04:05:35Z

ziwang-com
Jun 20, 2023

https://github.com/ziwang-com/AGM/issues/155
llamacpp归一化与AGM阿格姆项目

0 replies

dennislysenko · 2023-07-13T17:03:37Z

dennislysenko
Jul 13, 2023

Hey @ggerganov, I'm having trouble finding references to text to speech in the new roadmap project. Is that still in the plan?

0 replies

ggerganov · 2023-07-14T08:34:27Z

ggerganov
Jul 14, 2023
Maintainer Author

The latest update was @PABannier implementing Meta's Encodec codec with ggml which was an important step forward. I don't think I'll be able to work on bark before finishing the existing roadmap, but if someone else is already working on it, I can add it to the roadmap and reference the respective project.

1 reply

PABannier Jul 14, 2023

I'm working on bark.cpp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap June 2023 #1729

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 12 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Roadmap June 2023 #1729

ggerganov Jun 7, 2023 Maintainer

New roadmap format as Github project: https://github.com/users/ggerganov/projects/7

Outdated below

News

Tasks

Replies: 8 comments · 12 replies

howard0su Jun 9, 2023 Collaborator

ggerganov Jun 19, 2023 Maintainer Author

ggerganov Jul 14, 2023 Maintainer Author

ggerganov
Jun 7, 2023
Maintainer

Replies: 8 comments 12 replies

howard0su
Jun 9, 2023
Collaborator

ggerganov Jun 19, 2023
Maintainer Author

ggerganov
Jul 14, 2023
Maintainer Author