Prepare for release #81

philpax · 2023-03-26T19:34:47Z

Mostly housekeeping.

Closes #57.

Still to do:

Document llama-rs
Update to latest llama.cpp #62
Use usize over i32 where possible #18
Use RMSNorm for normalization #80
Make InferenceSession Clone-able #48
API footgun: infer_next_token still works after end of text #44
Fix performance on Apple Silicon

Breaking changes: - EOD_TOKEN_ID -> EOT_TOKEN_ID - temp -> temperature - current_part zero-indexed

philpax · 2023-03-29T21:42:18Z

This is good to go aside from the AS fix (need to test on work laptop).

Something I noticed is that InferenceParameters contains play_back_previous_tokens (only relevant to inference_with_prompt) and inference_with_prompt takes maximum_token_count as a parameter - we should consider moving these into a specific struct for parameters for inference_with_prompt only, like

    pub fn inference_with_prompt<E: std::error::Error + 'static>(
        &mut self,
        model: &Model,
        vocab: &Vocabulary,
        inference_params: &InferenceParameters,
        inference_with_prompt_params: &InferenceWithPromptParameters,
        prompt: &str,
        rng: &mut impl rand::Rng,
        callback: impl Fn(&str) -> Result<(), E>,
    ) -> Result<InferenceStats, InferenceError> {
}

We could also potentially move the &mut impl rand::Rng into InferenceParameters. I'm not sure if this is a good idea yet, but it should let us simplify the number of arguments some more.

Is there a context in which you would use the Vocabulary of a model without the model itself? I'm leaning towards moving that into Model, too.

KerfuffleV2 · 2023-03-29T23:36:19Z

We could also potentially move the &mut impl rand::Rng into InferenceParameters.

You'd have to add a lifetime to the struct and maybe it would need to be mutable also, so probably not worth it. (Also, that's a lot of tasks checked off. Nice work!)

hhamud · 2023-03-30T20:28:44Z

I'm on apple silicon and I haven't felt any performance issues versus the original implementation of llama.cpp in C. What are the performance issues?

philpax · 2023-03-30T20:29:48Z

We don't compile with any flags with ARM, so the compiler won't use NEON etc. Easy enough to fix, just haven't yet

setzer22

Thanks a lot for the changes! :) Everything looks good, there's just a very minor nitpick comment from my end.

Also, why does ggml.rs appear as a removed file? The code seems to still be using it, and I don't see where that code has moved to 🤔

llama-rs/src/lib.rs

…or-release

This builds with Accelerate and uses performance cores for CPU count.

philpax · 2023-04-01T16:40:12Z

Thanks a lot for the changes! :) Everything looks good, there's just a very minor nitpick comment from my end.

Also, why does ggml.rs appear as a removed file? The code seems to still be using it, and I don't see where that code has moved to 🤔

It's moved to ggml/src/lib.rs. Not sure why Git can't detect the move.

setzer22

LGTM! 😄

philpax added 9 commits March 26, 2023 20:57

chore: ggml-raw -> ggml-sys

edb2847

chore: move ggml to its own crate

360fae6

chore: use workspace version

e3792eb

fix(ggml): mark Tensor::data as unsafe

8b6d262

docs(ggml): Document library

eef35bc

doc(llama): Document library

fa0aa20

Breaking changes: - EOD_TOKEN_ID -> EOT_TOKEN_ID - temp -> temperature - current_part zero-indexed

refactor(llama): remove unnecessary tokenize wrapper

8ec0d64

refactor: move sample_top_p_top_k to InferenceSession

e39ddbd

fix #18 - use usize across the board

f31252a

This was referenced Mar 27, 2023

Publish to crates.io #57

Closed

Support for RWKV #75

Open

Keep context in repl #76

Closed

philpax added 5 commits March 29, 2023 22:22

fix #44 - make EndOfText an (ignorable) error

c8bdf01

feat(ggml): update to llama.cpp#b51c717

6ac8e46

fix #80 - use RMSNorm for normalization

cdb630d

fix #48 - make InferenceSession clonable

086e7db

chore: clippy fixes

d9b9d05

philpax changed the title ~~WIP: Prepare for release~~ Prepare for release Mar 29, 2023

philpax added the meta:maintenance Changes that will make it easier for us to maintain code label Mar 29, 2023

docs(llama-rs): fix old comment

66767cd

philpax mentioned this pull request Mar 31, 2023

Performance issue ? #97

Closed

setzer22 approved these changes Apr 1, 2023

View reviewed changes

llama-rs/src/lib.rs Outdated Show resolved Hide resolved

philpax added 4 commits April 1, 2023 17:41

Merge branch 'main' of github.com:rustformers/llama-rs into prepare-f…

54cfe9d

…or-release

chore: minor fixes

2a9fe6c

feat: Apple Silicon performance fixes

adfd173

This builds with Accelerate and uses performance cores for CPU count.

fix(build): clippy

9e53d0e

philpax mentioned this pull request Apr 1, 2023

Are the enumerated values in ggml-raw/src/lib.rs off-by-one after GGML_OP_NORM #98

Closed

setzer22 approved these changes Apr 2, 2023

View reviewed changes

philpax merged commit 8a87611 into rustformers:main Apr 2, 2023

philpax deleted the prepare-for-release branch April 2, 2023 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare for release #81

Prepare for release #81

philpax commented Mar 26, 2023 •

edited

Loading

philpax commented Mar 29, 2023

KerfuffleV2 commented Mar 29, 2023 •

edited

Loading

hhamud commented Mar 30, 2023

philpax commented Mar 30, 2023

setzer22 left a comment

philpax commented Apr 1, 2023

setzer22 left a comment

Prepare for release #81

Prepare for release #81

Conversation

philpax commented Mar 26, 2023 • edited Loading

philpax commented Mar 29, 2023

KerfuffleV2 commented Mar 29, 2023 • edited Loading

hhamud commented Mar 30, 2023

philpax commented Mar 30, 2023

setzer22 left a comment

Choose a reason for hiding this comment

philpax commented Apr 1, 2023

setzer22 left a comment

Choose a reason for hiding this comment

philpax commented Mar 26, 2023 •

edited

Loading

KerfuffleV2 commented Mar 29, 2023 •

edited

Loading