A Lightweight Language Model Execution Environment Using FPGA
Swan is an OSS project implemented in C++.
Its goal is to efficiently run language models on general-purpose FPGAs using High-Level Synthesis (HLS).
This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited resources.
- Versatility: Supports common FPGA boards such as the KV260.
- Scalability: The source code is written in C++, making customization and extension easy.
- Lightweight: Considers the size constraints of language models and adopts an efficient architecture.
To build and run Swan, the following tools and libraries are required:
- CMake
- g++
- HLS tools (e.g., Vivado HLS)
To clone the Swan repository, run the following command:
$ git clone [email protected]:turingmotors/swan.git
$ cd swan
Download 15M parameter model from huggingface.co/karpathy/tinyllamas:
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin -O model/stories15M.bin
wget https://raw.githubusercontent.com/leloykun/llama2.cpp/master/tokenizer.bin -O model/tokenizer.bin
See technical blog for details on building Swan in an FPGA environment.
$ mkdir -p build && cd build
$ cmake ..
$ make && cd ..
Once the build is complete, you can run Swan with the following command:
$ ./build/swan
Swan supports the following options:
Usage: ./build/swan [options]
Options:
--weight_path : Weight file path
--vocab_path : Tokenizer file path
--max_seq : Maximum sequence length
--temp : Temperature for sampling
--color : Enable color output
--log : Enable log output
--help, -h : Show this help message
This project is inspired by llama2.c.
This project is released under the Apache License 2.0.
Contributions to Swan are highly welcome. Please submit feedback and improvement suggestions through Issues and Pull Requests.
Turing Inc. is supporting the development of Swan.