llama2.c is a very simple implementation to run inference of models with a Llama2-like transformer-based LLM architecture.
This is a pure C# implementation of the same thing. It is optimized for speed and very simple to understand and modify.
Requires .net7 or higher.
- First put the stories15M.bin file in the same directory as the executable. You can download it from here
- Get tokenizer from here and put it in the same directory as the executable.
dotnet build -c Release
.\bin\Release\net7.0\llama2.cs.exe stories15M.bin
.\bin\Release\net7.0\llama2.cs.exe stories15M.bin -i "A long time ago a"
- Inference with Llama2 checkpoints
- Use high performance C# types from .net8?
- Add training functionality