llama : combine expert tensors into a single tensor #6082

ggerganov · 2024-03-15T12:55:03Z

Currently, we store separate tensors for each expert:

https://github.com/ggerganov/llama.cpp/blob/3020327f6cd6d2ce50528dd65f4b199d2ea8b1ae/ggml.c#L4442-L4455

This leads to large number of possible "source" tensors for the _id ops which increases significantly the size of struct ggml_tensor on the stack:

https://github.com/ggerganov/llama.cpp/blob/3020327f6cd6d2ce50528dd65f4b199d2ea8b1ae/ggml.h#L573-L576

Additionally, the Metal implementation is currently hacked to support up to 8 experts and extension to more than that is not completely obvious:

https://github.com/ggerganov/llama.cpp/blob/3020327f6cd6d2ce50528dd65f4b199d2ea8b1ae/ggml-metal.m#L1750-L1759

We should improve this, with one possible way being to store the data for the experts into a single tensor and address is with appropriate offsets

The text was updated successfully, but these errors were encountered:

slaren · 2024-03-27T19:33:52Z

I am not sure if we can implement this change while maintaining compatibility with existing models without breaking mmap, since we need to modify the layout of the tensors. I think that maintaining backwards compatibility with models with split experts is important, we should not ask people to re-download 50GB models, but we may have to disable mmap with old models.

ggerganov added the refactoring Refactoring label Mar 15, 2024

ggerganov added this to ggml : roadmap Mar 15, 2024

ggerganov moved this to Todo in ggml : roadmap Mar 15, 2024

ggerganov mentioned this issue Mar 27, 2024

Add support for DBRX models: dbrx-base and dbrx-instruct #6344

Closed

4 tasks

ggerganov added the high priority Very important issue label Mar 27, 2024

slaren mentioned this issue Mar 29, 2024

ggml : update mul_mat_id to use the same tensor for all the experts #6387

Merged

10 tasks

ggerganov closed this as completed in #6387 Apr 3, 2024

ggerganov moved this from Todo to Done in ggml : roadmap Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : combine expert tensors into a single tensor #6082

llama : combine expert tensors into a single tensor #6082

ggerganov commented Mar 15, 2024

slaren commented Mar 27, 2024

llama : combine expert tensors into a single tensor #6082

llama : combine expert tensors into a single tensor #6082

Comments

ggerganov commented Mar 15, 2024

slaren commented Mar 27, 2024