Skip to content
/ locallm Public

An api to query local language models using different backends or the browser

License

Notifications You must be signed in to change notification settings

synw/locallm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalLm

An api to query local language models using different backends

Version Name Description Doc
pub package @locallm/types The shared data types Api doc - Readme
pub package @locallm/api Run local language models using different backends Api doc - Readme
pub package @locallm/browser Run quantitized language models inside the browser Api doc - Readme

Supported backends

Quickstart

Api

npm install @locallm/api
# or
yarn add @locallm/api

Example with the Koboldcpp provider:

import { Lm } from "@locallm/api";

const lm = new Lm({
  providerType: "koboldcpp",
  serverUrl: "http://localhost:5001",
  onToken: (t) => process.stdout.write(t),
});
const template = "<s>[INST] {prompt} [/INST]";
const _prompt = template.replace("{prompt}", "list the planets in the solar system");
// run the inference query
const res = await lm.infer(_prompt, {
  temperature: 0,
  top_p: 0.35,
  n_predict: 200,
});
console.log(res);

In browser inference

Example of in browser inference with Qween 2 0.5b:

<div id="output"></div>
<script src="https://unpkg.com/[email protected]/dist/mod.min.js"></script>
<script src="https://unpkg.com/@locallm/[email protected]/dist/main.min.js"></script>
<script>
    const out = document.getElementById('output');
    const lm = $lm.WllamaProvider.init({
        onToken: (t) => { out.innerText = t },
        onStartEmit: () => {

        }
    });
    const model = {
        name: "Qween 0.5b",
        url: "https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GGUF/resolve/main/qwen2-0_5b-instruct-q5_k_m.gguf",
        ctx: 32768,
    }

    const onModelLoading = (st) => {
        const msg = "Model downloading: " + st.percent + " %";
        console.log(msg);
        out.innerText = msg;
        if (st.percent == 100) {
            out.innerText = "Loading model into memory ..."
        }
    }

    lm.loadBrowsermodel(model.name, model.url, model.ctx, onModelLoading).then(() => {
        out.innerText = "Ingesting prompt ...";
        const p = new $tpl.PromptTemplate("chatml")
            .replaceSystem("You are an AI assistant")
            .prompt("List the orbital periods of the planets of the solar system.")
        lm.infer(
            p,
            { temperature: 0, min_p: 0.05 }
        ).then((res) => {
            console.log("Stats", res.stats)
        });
    });
</script>

Examples

Check the examples directory for more examples

About

An api to query local language models using different backends or the browser

Resources

License

Stars

Watchers

Forks