Skip to content

An extension to Llama2.java implementation accelerated with GPUs, using TornadoVM and Level Zero

License

Notifications You must be signed in to change notification settings

jjfumero/llama2.tornadovm.java

This branch is 20 commits ahead of, 1 commit behind mikepapadim/llama2.tornadovm.java:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

109e426 · Oct 8, 2024

History

84 Commits
Dec 7, 2023
Oct 8, 2024
Oct 3, 2024
Dec 7, 2023
Oct 7, 2024
Oct 3, 2024
Oct 3, 2024
Dec 6, 2023
Aug 5, 2024
Sep 16, 2024
Oct 3, 2024
Oct 3, 2024
Oct 3, 2024
Oct 3, 2024
Oct 3, 2024
Dec 6, 2023

Repository files navigation

An extension of the Llama2.java implementation, accelerated with GPUs by using TornadoVM and Level Zero JNI (GPUs)

This repo extends https://github.com/mikepapadim/llama2.tornadovm.java and llama2.java with Level Zero JNI Support to run on GPUs.

This project has been checked with Intel HD Graphics (integrated GPUs) and Intel ARC (discrete GPUs).

Prerequisites

  • JDK 21+: This is essential as the project uses the Project Panama for native memory allocation.
  • TornadoVM: Detailed installation instructions can be found here.

Build

First, build TornadoVM with the Level Zero Backend:

cd tornadovm
./bin/tornadovm-installer --jdk jdk21 --backend=spirv 

Then, copy the setvars.sh into the local folder for the llama2.tornadovm and level zero

cp <tornadovm>/setvars.sh . 
source setvars.sh 

And finally, build this project with maven:

mvn clean package

Execution

Token files

Just like the original Java implementation, the program requires a tokenizer.bin file and the input models available in the TinyLlamas.

wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

How to run

The repository contains a run.sh script for running. This script takes the following arguments:

  • Version to run ( java, levelzero, tornadovm )
  • Device index to run ( java, levelzero, tornadovm )
  • The .bin model file
// Run with just the model with LevelZero
./run.sh -v levelzero stories15M.bin 
// Run in pure Java, without TornadoVM
./run.sh -v java stories15M.bin 
// Run with TornadoVM
./run.sh -v tornadovm stories15M.bin 

## Change device
// Run with just the model with LevelZero and Device 1
./run.sh -v levelzero -d 1 stories15M.bin 

// Run with TornadoVM and device 1
./run.sh -v tornadovm -d 1 stories15M.bin 

License

MIT

About

An extension to Llama2.java implementation accelerated with GPUs, using TornadoVM and Level Zero

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Java 92.0%
  • Python 2.8%
  • Shell 2.6%
  • C 2.6%