You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Engine] Refactor ChatModule to Engine, update all examples (#361)
### Overview
This PR refactors `ChatModule` to be named `Engine`, with very few
implementation changes (mainly new APIs and refactoring). This is
motivated by how `ChatModule` is already expanded to other
functionalities other than simple chatting. We can also easily build on
top of `Engine` when we support other modalities in the future.
This PR also:
- Introduce factory method `CreateEngine()` and `CreateWebEngine()` for
initializing an engine with a model loaded. These two APIs take in
`EngineConfig`, which essentially is a wrapper of the optional configs
(see API usage below for more).
- Updates all examples to be up to date (all tested); separate the
openai examples; categorize the examples in `examples/README.md`)
- Update `README.md` to reflect recent changes
- Deprecate and remove `ChatRestModule`
### API Usage
Besides, we finalize the OpenAI API in this PR:
```typescript
const initProgressCallback = (report: webllm.InitProgressReport) => {
const label = document.getElementById("init-label");
label.innerText = report.text;
};
const engine: webllm.EngineInterface = await webllm.CreateEngine(
"Llama-2-7b-chat-hf-q4f32_1",
{initProgressCallback: initProgressCallback}
);
const reply = await engine.chat.completions.create({
messages: [{ "role": "user", "content": "Tell me about Pittsburgh." }]
});
console.log(reply);
console.log(await engine.runtimeStatsText());
```
Note that if we need to separate the instantiation of `Engine` and
loading of a model (which is indeed more convenient for examples like
`simple-chat`), we can equivalently substitute
```typescript
const engine: webllm.EngineInterface = await webllm.CreateEngine(
"Llama-2-7b-chat-hf-q4f32_1",
{initProgressCallback: initProgressCallback}
);
```
with
```typescript
const engine: webllm.EngineInterface = new webllm.Engine();
engine.setInitProgressCallback(initProgressCallback);
await engine.reload(selectedModel, chatConfig, appConfig);
```
WebLLM is a modular, customizable javascript package that directly
6
+
WebLLM is a modular and customizable javascript package that directly
7
7
brings language model chats directly onto web browsers with hardware acceleration.
8
8
**Everything runs inside the browser with no server support and is accelerated with WebGPU.**
9
+
10
+
**WebLLM is fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat).**
11
+
That is, you can use the same OpenAI API on **any open source models** locally, with functionalities
12
+
including json-mode, function-calling, streaming, etc.
13
+
9
14
We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.
10
15
11
-
**[Check out our demo webpage to try out!](https://webllm.mlc.ai/)**
16
+
**[Check out our demo webpage to try it out!](https://webllm.mlc.ai/)**
12
17
You can use WebLLM as a base [npm package](https://www.npmjs.com/package/@mlc-ai/web-llm) and build your own web application on top of it by following the [documentation](https://mlc.ai/mlc-llm/docs/deploy/javascript.html) and checking out [Get Started](#get-started).
13
18
This project is a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm),
14
19
which runs LLMs natively on iPhone and other native local environments.
@@ -27,43 +32,42 @@ You can check out [examples/get-started](examples/get-started/) to see the compl
27
32
```typescript
28
33
import*aswebllmfrom"@mlc-ai/web-llm";
29
34
30
-
// We use label to intentionally keep it simple
31
-
function setLabel(id:string, text:string) {
32
-
const label =document.getElementById(id);
33
-
if (label==null) {
34
-
throwError("Cannot find label "+id);
35
-
}
36
-
label.innerText=text;
37
-
}
38
-
39
35
asyncfunction main() {
40
-
// create a ChatModule,
41
-
const chat =newwebllm.ChatModule();
42
-
// This callback allows us to report initialization progress
a complete chat app example in [examples/simple-chat](examples/simple-chat/).
118
+
You can find a complete chat app example in [examples/simple-chat](examples/simple-chat/).
119
+
120
+
### Chrome Extension
121
+
122
+
You can also find examples on building chrome extension with WebLLM in [examples/chrome-extension](examples/chrome-extension/) and [examples/chrome-extension-webgpu-service-worker](examples/chrome-extension-webgpu-service-worker/). The latter one leverages service worker, so the extension is persisten in the background.
112
123
124
+
## Full OpenAI Compatibility
113
125
114
-
## Customized Model Weights
126
+
WebLLM is designed to be fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat). Thus, besides building simple chat bot, you can also have the following functionalities with WebLLM:
127
+
-[streaming](examples/streaming): return output as chunks in real-time in the form of an AsyncGenerator
128
+
-[json-mode](examples/json-mode): efficiently ensure output is in json format, see [OpenAI Reference](https://platform.openai.com/docs/guides/text-generation/chat-completions-api) for more.
129
+
-[function-calling](examples/function-calling): function calling with fields `tools` and `tool_choice`.
130
+
-[seed-to-reproduce](examples/seed-to-reproduce): use seeding to ensure reproducible output with fields `seed`.
131
+
132
+
## Model Support
133
+
134
+
We export all supported models in `webllm.prebuiltAppConfig`, where you can see a list of models
135
+
that you can simply call `const engine: webllm.EngineInterface = await webllm.CreateEngine(anyModel)` with.
136
+
Prebuilt models include:
137
+
- Llama-2
138
+
- Gemma
139
+
- Phi-1.5 and Phi-2
140
+
- Mistral-7B-Instruct
141
+
- OpenHermes-2.5-Mistral-7B
142
+
- NeuralHermes-2.5-Mistral-7B
143
+
- TinyLlama
144
+
- RedPajama
145
+
146
+
Alternatively, you can compile your own model and weights as described below.
115
147
116
148
WebLLM works as a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm).
117
149
It reuses the model artifact and builds flow of MLC LLM, please check out
@@ -120,18 +152,17 @@ on how to add new model weights and libraries to WebLLM.
120
152
121
153
Here, we go over the high-level idea. There are two elements of the WebLLM package that enables new models and weight variants.
122
154
123
-
- model_url: Contains a URL to model artifacts, such as weights and meta-data.
124
-
- model_lib_url: A URL to the web assembly library (i.e. wasm file) that contains the executables to accelerate the model computations.
155
+
-`model_url`: Contains a URL to model artifacts, such as weights and meta-data.
156
+
-`model_lib_url`: A URL to the web assembly library (i.e. wasm file) that contains the executables to accelerate the model computations.
125
157
126
158
Both are customizable in the WebLLM.
127
159
128
160
```typescript
129
161
asyncmain() {
130
-
const myLlamaUrl ="/url/to/my/llama";
131
162
const appConfig = {
132
163
"model_list": [
133
164
{
134
-
"model_url": myLlamaUrl,
165
+
"model_url": "/url/to/my/llama",
135
166
"model_id": "MyLlama-3b-v1-q4f32_0"
136
167
"model_lib_url": "/url/to/myllama3b.wasm",
137
168
}
@@ -149,30 +180,18 @@ async main() {
149
180
// and cache it in the browser cache
150
181
// The chat will also load the model library from "/url/to/myllama3b.wasm",
151
182
// assuming that it is compatible to the model in myLlamaUrl.
Copy file name to clipboardExpand all lines: examples/README.md
+22-3Lines changed: 22 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,28 @@ Please send a pull request if you find things that belongs to here.
5
5
6
6
## Tutorial Examples
7
7
8
-
-[get-started](get-started): minimum get started example.
9
-
-[web-worker](web-worker): get started with web worker backed chat.
10
-
-[simple-chat](simple-chat): a mininum and complete chat app.
8
+
Note that all examples below run in-browser and use WebGPU as a backend.
9
+
10
+
#### Basic Chat Completion
11
+
-[get-started](get-started): minimum get started example with chat completion.
12
+
-[get-started-web-worker](get-started-web-worker): same as get-started, but using web worker.
13
+
-[multi-round-chat](multi-round-chat): while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache
14
+
-[simple-chat](simple-chat): a mininum and complete chat bot app.
15
+
-[next-simple-chat](next-simple-chat): a mininum and complete chat bot app with [Next.js](https://nextjs.org/).
16
+
17
+
#### Advanced OpenAI API Capabilities
18
+
These examples demonstrate various capabilities via WebLLM's OpenAI-like API.
19
+
-[streaming](streaming): return output as chunks in real-time in the form of an AsyncGenerator
20
+
-[json-mode](json-mode): efficiently ensure output is in json format, see [OpenAI Reference](https://platform.openai.com/docs/guides/text-generation/chat-completions-api) for more.
21
+
-[function-calling](function-calling): function calling with fields `tools` and `tool_choice`.
22
+
-[seed-to-reproduce](seed-to-reproduce): use seeding to ensure reproducible output with fields `seed`.
23
+
24
+
#### Chrome Extension
25
+
-[chrome-extension](chrome-extension): chrome extension that does not have a persistent background
26
+
-[chrome-extension-webgpu-service-worker](chrome-extension-webgpu-service-worker): chrome extension using service worker, hence having a persistent background
27
+
28
+
#### Others
29
+
-[logit-processor](logit-processor): while `logit_bias` is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API `forwardTokensAndSample()`.
// check if the request contains a message that the user sent a new message
23
11
if(request.input){
24
-
varinp=request.input;
12
+
letinp=request.input;
25
13
if(context.length>0){
26
-
inp="Use only the following context when answering the question at the end. Don't use any other knowledge.\n"+context+"\n\nQuestion: "+request.input+"\n\nHelpful Answer: ";
14
+
inp="Use only the following context when answering the question at the end. Don't use any other knowledge.\n"+context+"\n\nQuestion: "+request.input+"\n\nHelpful Answer: ";
0 commit comments