An extension of the Llama2.java implementation, accelerated with GPUs by using TornadoVM and Level Zero JNI (GPUs)

This repo extends https://github.com/mikepapadim/llama2.tornadovm.java and llama2.java with Level Zero JNI Support to run on GPUs.

This project has been checked with Intel HD Graphics (integrated GPUs) and Intel ARC (discrete GPUs).

Prerequisites

JDK 21+: This is essential as the project uses the Project Panama for native memory allocation.
TornadoVM: Detailed installation instructions can be found here.

Build

First, build TornadoVM with the Level Zero Backend:

cd tornadovm
./bin/tornadovm-installer --jdk jdk21 --backend=spirv

Then, copy the setvars.sh into the local folder for the llama2.tornadovm and level zero

cp <tornadovm>/setvars.sh . 
source setvars.sh

And finally, build this project with maven:

mvn clean package

Execution

Token files

Just like the original Java implementation, the program requires a tokenizer.bin file and the input models available in the TinyLlamas.

wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

How to run

The repository contains a run.sh script for running. This script takes the following arguments:

Version to run ( java, levelzero, tornadovm )
Device index to run ( java, levelzero, tornadovm )
The .bin model file

// Run with just the model with LevelZero
./run.sh -v levelzero stories15M.bin 
// Run in pure Java, without TornadoVM
./run.sh -v java stories15M.bin 
// Run with TornadoVM
./run.sh -v tornadovm stories15M.bin 

## Change device
// Run with just the model with LevelZero and Device 1
./run.sh -v levelzero -d 1 stories15M.bin 

// Run with TornadoVM and device 1
./run.sh -v tornadovm -d 1 stories15M.bin

License

MIT

Name	Name	Last commit message	Last commit date
Latest commit jjfumero Profile LevelZero kernel on Oct 8, 2024 109e426 · Oct 8, 2024 History 84 Commits
scripts	scripts	add benchmarking script	Dec 7, 2023
src/main/java/io/github/mikepapadim	src/main/java/io/github/mikepapadim	Profile LevelZero kernel on	Oct 8, 2024
.gitignore	.gitignore	minor change	Oct 3, 2024
LICENSE	LICENSE	update license	Dec 7, 2023
README.md	README.md	README file updated	Oct 7, 2024
benchmark.sh	benchmark.sh	Benchmarking script	Oct 3, 2024
butil.sh	butil.sh	Benchmarking scripts added	Oct 3, 2024
compile.sh	compile.sh	Update the README file and include new scripts for convenience	Dec 6, 2023
copyData.cl	copyData.cl	[wip] First version working on USM with Level Zero	Aug 5, 2024
createSPIRVCode.sh	createSPIRVCode.sh	Adjusting SPIR-V kernels for performance	Sep 16, 2024
filter.sh	filter.sh	Filter script added	Oct 3, 2024
kernels.cl	kernels.cl	Add Java seq & New OpenCL version with local memory	Oct 3, 2024
kernels.spv	kernels.spv	Add Java seq & New OpenCL version with local memory	Oct 3, 2024
pom.xml	pom.xml	Add Java seq & New OpenCL version with local memory	Oct 3, 2024
run.sh	run.sh	Benchmarks improved	Oct 3, 2024
set_paths.sh	set_paths.sh	Update the README file and include new scripts for convenience	Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An extension of the Llama2.java implementation, accelerated with GPUs by using TornadoVM and Level Zero JNI (GPUs)

Prerequisites

Build

Execution

Token files

How to run

License

About

Languages

License

jjfumero/llama2.tornadovm.java

Folders and files

Latest commit

History

Repository files navigation

An extension of the Llama2.java implementation, accelerated with GPUs by using TornadoVM and Level Zero JNI (GPUs)

Prerequisites

Build

Execution

Token files

How to run

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages