Skip to content

This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited resources.

License

Notifications You must be signed in to change notification settings

turingmotors/swan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Swan

A Lightweight Language Model Execution Environment Using FPGA

English | 日本語 | 中文

Swan is an OSS project implemented in C++.
Its goal is to efficiently run language models on general-purpose FPGAs using High-Level Synthesis (HLS).

This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited resources.

Features

  • Versatility: Supports common FPGA boards such as the KV260.
  • Scalability: The source code is written in C++, making customization and extension easy.
  • Lightweight: Considers the size constraints of language models and adopts an efficient architecture.

Dependencies

To build and run Swan, the following tools and libraries are required:

  • CMake
  • g++
  • HLS tools (e.g., Vivado HLS)

Clone & Download Weight Files

To clone the Swan repository, run the following command:

$ git clone [email protected]:turingmotors/swan.git
$ cd swan

Download 15M parameter model from huggingface.co/karpathy/tinyllamas:

wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin -O model/stories15M.bin
wget https://raw.githubusercontent.com/leloykun/llama2.cpp/master/tokenizer.bin -O model/tokenizer.bin

Building

FPGA Environment

See technical blog for details on building Swan in an FPGA environment.

CPU Environment

$ mkdir -p build && cd build
$ cmake ..
$ make && cd ..

Once the build is complete, you can run Swan with the following command:

$ ./build/swan

Command Line Options

Swan supports the following options:

Usage: ./build/swan [options]
Options:
  --weight_path   : Weight file path
  --vocab_path    : Tokenizer file path
  --max_seq       : Maximum sequence length
  --temp          : Temperature for sampling
  --color         : Enable color output
  --log           : Enable log output
  --help, -h      : Show this help message

Reference Projects

This project is inspired by llama2.c.

License

This project is released under the Apache License 2.0.

Contributions

Contributions to Swan are highly welcome. Please submit feedback and improvement suggestions through Issues and Pull Requests.
Turing Inc. is supporting the development of Swan.

About

This project aims to enable language model inference on FPGAs, supporting AI applications in edge devices and environments with limited resources.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published