Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

BLOOM Refactor #141

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d84aa7f
Create a Model trait
danforbes Apr 15, 2023
e0713a1
Bloom model
danforbes Apr 15, 2023
6bfda75
cargo fmt
danforbes Apr 16, 2023
73f59c3
Rename llama-rs to llm-base
danforbes Apr 16, 2023
e670c25
Clippy
danforbes Apr 16, 2023
c4b4176
Remove redundant associated Model type from Model trait
danforbes Apr 16, 2023
1cf305f
Remove associated Layer type from Model trait
danforbes Apr 16, 2023
0d4dde9
cargo fmt
danforbes Apr 16, 2023
849c28d
Docs
danforbes Apr 16, 2023
54ad890
Tests and examples
danforbes Apr 16, 2023
4ba7c1c
Layers are private
danforbes Apr 16, 2023
dcf85ff
Merge branch 'main' of github.com:rustformers/llama-rs into dfo/model…
philpax Apr 22, 2023
43ecac1
Merge branch 'main' into dfo/model/bloom
philpax Apr 25, 2023
440bd69
Fix build
philpax Apr 25, 2023
5658484
refactor: introduce llm(-cli)
philpax Apr 25, 2023
bcf5627
Fix model name in LLaMA inference example
danforbes Apr 26, 2023
5ac4b79
feat: wire up both bloom/llama to CLI
philpax Apr 26, 2023
1601240
Merge branch 'dfo/model/bloom' of github.com:danforbes/llama-rs into …
philpax Apr 26, 2023
1761512
Add example for testing BLOOM inference
danforbes Apr 26, 2023
8d2d9c6
cargo fmt
danforbes Apr 26, 2023
813bdd1
Add launch.json for debugging loading and inference
danforbes Apr 26, 2023
c608b4b
Merge branch 'main' into dfo/model/bloom
danforbes Apr 27, 2023
e19418c
Check tensor dimensions when loading
danforbes Apr 27, 2023
e35f93b
`Model` -> `KnownModel`, `ErasedModel -> Model`
danforbes Apr 27, 2023
288df7f
Merge branch 'main' into dfo/model/bloom
danforbes Apr 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"type": "lldb",
"request": "launch",
"name": "Debug example 'llama_inference'",
"cargo": {
"args": [
"build",
"--example=llama_inference",
"--package=llama"
],
"filter": {
"name": "llama_inference",
"kind": "example"
}
},
"args": ["${env:HOME}/.ggml-models/gpt4all-7b.bin"],
"cwd": "${workspaceFolder}"
},
{
"type": "lldb",
"request": "launch",
"name": "Debug example 'bloom_inference'",
"cargo": {
"args": [
"build",
"--example=bloom_inference",
"--package=bloom"
],
"filter": {
"name": "bloom_inference",
"kind": "example"
}
},
"args": ["${env:HOME}/.ggml-models/bloom-7b.bin"],
"cwd": "${workspaceFolder}"
}
]
}
73 changes: 56 additions & 17 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 12 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
[workspace]
members = [
# Crates
"ggml-sys",
"ggml",
"ggml-format",
"llama-rs",
"llama-cli",
"ggml",
"llm-base",
"llama",
"bloom",
"llm",
"llm-cli",

# Tools
"generate-ggml-bindings"
]
resolver = "2"
Expand All @@ -13,4 +19,7 @@ resolver = "2"
version = "0.1.0"

[workspace.dependencies]
bytemuck = "1.13.1"
log = "0.4"
rand = "0.8.5"
serde = { version = "1.0", features = ["derive"] }
34 changes: 22 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ performance as the original code.

Make sure you have a Rust 1.65.0 or above and C toolchain[^1] set up.

`llama-rs` is a Rust library, while `llama-cli` is a CLI application that wraps
`llama-rs` and offers basic inference capabilities.
`llm-base`, `bloom`, and `llama` are Rust libraries, while `bloom-cli` and
`llama-cli` are a CLI applications that wrap `bloom` and `llama`, respectively,
and offer basic inference capabilities.

The following instructions explain how to build `llama-cli`.
The following instructions explain how to build the CLI applications.

**NOTE**: For best results, make sure to build and run in release mode.
Debug builds are going to be very slow.
Expand All @@ -43,33 +44,34 @@ Debug builds are going to be very slow.
Run

```shell
cargo install --git https://github.com/rustformers/llama-rs llama-cli
cargo install --git https://github.com/rustformers/llama-rs bloom-cli llama-cli
```

to install `llama-cli` to your Cargo `bin` directory, which `rustup` is likely to
have added to your `PATH`.
to install `bloom-cli` and `llama-cli` to your Cargo `bin` directory, which
`rustup` is likely to have added to your `PATH`.

It can then be run through `llama-cli`.
The CLI applications can then be run through `bloom-cli` and `llama-cli`, respectively.

### Building from repository

Clone the repository, and then build it through

```shell
cargo build --release --bin llama-cli
cargo build --release
```

The resulting binary will be at `target/release/llama-cli[.exe]`.
The resulting binaries will be at `target/release/bloom-cli[.exe]` and
`target/release/llama-cli[.exe]`, respectively.

It can also be run directly through Cargo, using
They can also be run directly through Cargo, using

```shell
cargo run --release --bin llama-cli -- <ARGS>
cargo run --release --bin {bloom,llama}-cli -- <ARGS>
```

This is useful for development.

### Getting the weights
### Getting LLaMA weights

In order to run the inference code in `llama-rs`, a copy of the model's weights
are required.
Expand Down Expand Up @@ -107,6 +109,14 @@ cargo run -p llama-cli quantize /path/to/your/models/7B/ggml-model-f16.bin /path
> The [llama.cpp repository](https://github.com/ggerganov/llama.cpp) has
> additional information on how to obtain and run specific models.

### BLOOM

The open-source [BLOOM](https://bigscience.huggingface.co/blog/bloom) model is
also supported.
[More information](https://huggingface.co/docs/transformers/model_doc/bloom)
about BLOOM is available on HuggingFace, as are some
[quantized models](https://huggingface.co/models?search=bloom%20ggml).

_Support for other open source models is currently planned. For models where
weights can be legally distributed, this section will be updated with scripts to
make the install process as user-friendly as possible. Due to the model's legal
Expand Down
15 changes: 15 additions & 0 deletions bloom/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[package]
name = "bloom"
version = { workspace = true }
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
ggml = { path = "../ggml" }
llm-base = { path = "../llm-base" }

bytemuck = { workspace = true }

[dev-dependencies]
rand = { workspace = true }
33 changes: 33 additions & 0 deletions bloom/examples/bloom_inference.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
use std::{convert::Infallible, env::args, io::Write};

use llm_base::{snapshot, LoadError};

extern crate bloom;

fn main() -> Result<(), LoadError> {
let args: Vec<String> = args().collect();
let bloom = bloom::Bloom::load(&args[1], true, 32, |_| {})?;
let (mut session, _) = snapshot::read_or_create_session(
&bloom,
Default::default(),
Default::default(),
Default::default(),
);

let _ = session.inference_with_prompt::<Infallible>(
&bloom,
&Default::default(),
"The best kind of wine is ",
Some(32),
&mut rand::thread_rng(),
|t| {
print!("{t}");
std::io::stdout().flush().unwrap();

Ok(())
},
);

println!();
Ok(())
}
Loading