Unofficial PyTorch implementation of the LLAMA 3 models.
This project contains implementations of the following components.
- RMSNorm
- Rotary PE
- FeedForward Network
- KV Cache
- Self Attention (GQA)
You can run the following llama models on your CPU
- llama3.2-1B --> 4Gb RAM --> 16bit precision
- llama3.1-8B --> 8Gb RAM --> 8bit precision
poetry install
bash download_models.sh
OR
poetry install --only dev
llama model download --source meta --model-id <your-model-id> --meta-url <meta-email-url>
python main.py