Skip to content

Commit 489d882

Browse files
authored
[Engine] Refactor ChatModule to Engine, update all examples (#361)
### Overview This PR refactors `ChatModule` to be named `Engine`, with very few implementation changes (mainly new APIs and refactoring). This is motivated by how `ChatModule` is already expanded to other functionalities other than simple chatting. We can also easily build on top of `Engine` when we support other modalities in the future. This PR also: - Introduce factory method `CreateEngine()` and `CreateWebEngine()` for initializing an engine with a model loaded. These two APIs take in `EngineConfig`, which essentially is a wrapper of the optional configs (see API usage below for more). - Updates all examples to be up to date (all tested); separate the openai examples; categorize the examples in `examples/README.md`) - Update `README.md` to reflect recent changes - Deprecate and remove `ChatRestModule` ### API Usage Besides, we finalize the OpenAI API in this PR: ```typescript const initProgressCallback = (report: webllm.InitProgressReport) => { const label = document.getElementById("init-label"); label.innerText = report.text; }; const engine: webllm.EngineInterface = await webllm.CreateEngine( "Llama-2-7b-chat-hf-q4f32_1", {initProgressCallback: initProgressCallback} ); const reply = await engine.chat.completions.create({ messages: [{ "role": "user", "content": "Tell me about Pittsburgh." }] }); console.log(reply); console.log(await engine.runtimeStatsText()); ``` Note that if we need to separate the instantiation of `Engine` and loading of a model (which is indeed more convenient for examples like `simple-chat`), we can equivalently substitute ```typescript const engine: webllm.EngineInterface = await webllm.CreateEngine( "Llama-2-7b-chat-hf-q4f32_1", {initProgressCallback: initProgressCallback} ); ``` with ```typescript const engine: webllm.EngineInterface = new webllm.Engine(); engine.setInitProgressCallback(initProgressCallback); await engine.reload(selectedModel, chatConfig, appConfig); ```
1 parent f6ef70a commit 489d882

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1153
-1154
lines changed

README.md

Lines changed: 88 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,17 @@
33
# Web LLM
44
| [NPM Package](https://www.npmjs.com/package/@mlc-ai/web-llm) | [Get Started](#get-started) | [Examples](examples) | [Documentation](https://mlc.ai/mlc-llm/docs/deploy/javascript.html) | [MLC LLM](https://github.com/mlc-ai/mlc-llm) | [Discord][discord-url] |
55

6-
WebLLM is a modular, customizable javascript package that directly
6+
WebLLM is a modular and customizable javascript package that directly
77
brings language model chats directly onto web browsers with hardware acceleration.
88
**Everything runs inside the browser with no server support and is accelerated with WebGPU.**
9+
10+
**WebLLM is fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat).**
11+
That is, you can use the same OpenAI API on **any open source models** locally, with functionalities
12+
including json-mode, function-calling, streaming, etc.
13+
914
We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.
1015

11-
**[Check out our demo webpage to try out!](https://webllm.mlc.ai/)**
16+
**[Check out our demo webpage to try it out!](https://webllm.mlc.ai/)**
1217
You can use WebLLM as a base [npm package](https://www.npmjs.com/package/@mlc-ai/web-llm) and build your own web application on top of it by following the [documentation](https://mlc.ai/mlc-llm/docs/deploy/javascript.html) and checking out [Get Started](#get-started).
1318
This project is a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm),
1419
which runs LLMs natively on iPhone and other native local environments.
@@ -27,43 +32,42 @@ You can check out [examples/get-started](examples/get-started/) to see the compl
2732
```typescript
2833
import * as webllm from "@mlc-ai/web-llm";
2934

30-
// We use label to intentionally keep it simple
31-
function setLabel(id: string, text: string) {
32-
const label = document.getElementById(id);
33-
if (label == null) {
34-
throw Error("Cannot find label " + id);
35-
}
36-
label.innerText = text;
37-
}
38-
3935
async function main() {
40-
// create a ChatModule,
41-
const chat = new webllm.ChatModule();
42-
// This callback allows us to report initialization progress
43-
chat.setInitProgressCallback((report: webllm.InitProgressReport) => {
44-
setLabel("init-label", report.text);
36+
const initProgressCallback = (report: webllm.InitProgressReport) => {
37+
const label = document.getElementById("init-label");
38+
label.innerText = report.text;
39+
};
40+
const selectedModel = "Llama-2-7b-chat-hf-q4f32_1";
41+
const engine: webllm.EngineInterface = await webllm.CreateEngine(
42+
selectedModel,
43+
/*engineConfig=*/{ initProgressCallback: initProgressCallback }
44+
);
45+
46+
const reply0 = await engine.chat.completions.create({
47+
messages: [{ "role": "user", "content": "Tell me about Pittsburgh." }]
4548
});
46-
// You can also try out "RedPajama-INCITE-Chat-3B-v1-q4f32_1"
47-
await chat.reload("Llama-2-7b-chat-hf-q4f32_1");
49+
console.log(reply0);
50+
console.log(await engine.runtimeStatsText());
51+
}
4852

49-
const generateProgressCallback = (_step: number, message: string) => {
50-
setLabel("generate-label", message);
51-
};
53+
main();
54+
```
5255

53-
const prompt0 = "What is the capital of Canada?";
54-
setLabel("prompt-label", prompt0);
55-
const reply0 = await chat.generate(prompt0, generateProgressCallback);
56-
console.log(reply0);
56+
Note that if you need to separate the instantiation of `webllm.Engine` from loading a model, you could substitute
5757

58-
const prompt1 = "Can you write a poem about it?";
59-
setLabel("prompt-label", prompt1);
60-
const reply1 = await chat.generate(prompt1, generateProgressCallback);
61-
console.log(reply1);
58+
```typescript
59+
const engine: webllm.EngineInterface = await webllm.CreateEngine(
60+
selectedModel,
61+
/*engineConfig=*/{ initProgressCallback: initProgressCallback }
62+
);
63+
```
6264

63-
console.log(await chat.runtimeStatsText());
64-
}
65+
with the equivalent
6566

66-
main();
67+
```typescript
68+
const engine: webllm.EngineInterface = new webllm.Engine();
69+
engine.setInitProgressCallback(initProgressCallback);
70+
await engine.reload(selectedModel, chatConfig, appConfig);
6771
```
6872

6973
### Using Web Worker
@@ -72,46 +76,74 @@ WebLLM comes with API support for WebWorker so you can hook
7276
the generation process into a separate worker thread so that
7377
the compute in the webworker won't disrupt the UI.
7478

75-
We first create a worker script that created a ChatModule and
79+
We first create a worker script that created a Engine and
7680
hook it up to a handler that handles requests.
7781

7882
```typescript
7983
// worker.ts
80-
import { ChatWorkerHandler, ChatModule } from "@mlc-ai/web-llm";
84+
import { EngineWorkerHandler, Engine } from "@mlc-ai/web-llm";
8185

82-
// Hookup a chat module to a worker handler
83-
const chat = new ChatModule();
84-
const handler = new ChatWorkerHandler(chat);
86+
// Hookup an Engine to a worker handler
87+
const engine = new Engine();
88+
const handler = new EngineWorkerHandler(engine);
8589
self.onmessage = (msg: MessageEvent) => {
8690
handler.onmessage(msg);
8791
};
8892
```
8993

90-
Then in the main logic, we create a `ChatWorkerClient` that
91-
implements the same `ChatInterface`. The rest of the logic remains the same.
94+
Then in the main logic, we create a `WebWorkerEngine` that
95+
implements the same `EngineInterface`. The rest of the logic remains the same.
9296

9397
```typescript
9498
// main.ts
9599
import * as webllm from "@mlc-ai/web-llm";
96100

97101
async function main() {
98-
// Use a chat worker client instead of ChatModule here
99-
const chat = new webllm.ChatWorkerClient(new Worker(
100-
new URL('./worker.ts', import.meta.url),
101-
{type: 'module'}
102-
));
102+
// Use a WebWorkerEngine instead of Engine here
103+
const engine: webllm.EngineInterface = await webllm.CreateWebWorkerEngine(
104+
/*worker=*/new Worker(
105+
new URL('./worker.ts', import.meta.url),
106+
{ type: 'module' }
107+
),
108+
/*modelId=*/selectedModel,
109+
/*engineConfig=*/{ initProgressCallback: initProgressCallback }
110+
);
103111
// everything else remains the same
104112
}
105113
```
106114

107115

108116
### Build a ChatApp
109117

110-
You can find a complete
111-
a complete chat app example in [examples/simple-chat](examples/simple-chat/).
118+
You can find a complete chat app example in [examples/simple-chat](examples/simple-chat/).
119+
120+
### Chrome Extension
121+
122+
You can also find examples on building chrome extension with WebLLM in [examples/chrome-extension](examples/chrome-extension/) and [examples/chrome-extension-webgpu-service-worker](examples/chrome-extension-webgpu-service-worker/). The latter one leverages service worker, so the extension is persisten in the background.
112123

124+
## Full OpenAI Compatibility
113125

114-
## Customized Model Weights
126+
WebLLM is designed to be fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat). Thus, besides building simple chat bot, you can also have the following functionalities with WebLLM:
127+
- [streaming](examples/streaming): return output as chunks in real-time in the form of an AsyncGenerator
128+
- [json-mode](examples/json-mode): efficiently ensure output is in json format, see [OpenAI Reference](https://platform.openai.com/docs/guides/text-generation/chat-completions-api) for more.
129+
- [function-calling](examples/function-calling): function calling with fields `tools` and `tool_choice`.
130+
- [seed-to-reproduce](examples/seed-to-reproduce): use seeding to ensure reproducible output with fields `seed`.
131+
132+
## Model Support
133+
134+
We export all supported models in `webllm.prebuiltAppConfig`, where you can see a list of models
135+
that you can simply call `const engine: webllm.EngineInterface = await webllm.CreateEngine(anyModel)` with.
136+
Prebuilt models include:
137+
- Llama-2
138+
- Gemma
139+
- Phi-1.5 and Phi-2
140+
- Mistral-7B-Instruct
141+
- OpenHermes-2.5-Mistral-7B
142+
- NeuralHermes-2.5-Mistral-7B
143+
- TinyLlama
144+
- RedPajama
145+
146+
Alternatively, you can compile your own model and weights as described below.
115147

116148
WebLLM works as a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm).
117149
It reuses the model artifact and builds flow of MLC LLM, please check out
@@ -120,18 +152,17 @@ on how to add new model weights and libraries to WebLLM.
120152

121153
Here, we go over the high-level idea. There are two elements of the WebLLM package that enables new models and weight variants.
122154

123-
- model_url: Contains a URL to model artifacts, such as weights and meta-data.
124-
- model_lib_url: A URL to the web assembly library (i.e. wasm file) that contains the executables to accelerate the model computations.
155+
- `model_url`: Contains a URL to model artifacts, such as weights and meta-data.
156+
- `model_lib_url`: A URL to the web assembly library (i.e. wasm file) that contains the executables to accelerate the model computations.
125157

126158
Both are customizable in the WebLLM.
127159

128160
```typescript
129161
async main() {
130-
const myLlamaUrl = "/url/to/my/llama";
131162
const appConfig = {
132163
"model_list": [
133164
{
134-
"model_url": myLlamaUrl,
165+
"model_url": "/url/to/my/llama",
135166
"model_id": "MyLlama-3b-v1-q4f32_0"
136167
"model_lib_url": "/url/to/myllama3b.wasm",
137168
}
@@ -149,30 +180,18 @@ async main() {
149180
// and cache it in the browser cache
150181
// The chat will also load the model library from "/url/to/myllama3b.wasm",
151182
// assuming that it is compatible to the model in myLlamaUrl.
152-
await chat.reload("MyLlama-3b-v1-q4f32_0", chatOpts, appConfig);
183+
const engine = await webllm.CreateEngine(
184+
"MyLlama-3b-v1-q4f32_0",
185+
/*engineConfig=*/{ chatOpts: chatOpts, appConfig: appConfig }
186+
);
153187
}
154188
```
155189

156190
In many cases, we only want to supply the model weight variant, but
157191
not necessarily a new model (e.g. `NeuralHermes-Mistral` can reuse `Mistral`'s
158-
model library; `WizardMath` can reuse `Llama-2`'s model library). For
159-
an example of how a model library is shared by different model variants,
160-
see `examples/simple-chat/src/gh-config.js`. We also provide
161-
a plethora of prebuilt model libraries, including:
162-
163-
- `Llama-2-7b-chat-hf-q4f32_1`: Llama-7b models.
164-
- `RedPajama-INCITE-Chat-3B-v1-q4f32_1`: RedPajama-3B variants.
165-
- `Mistral-7B-Instruct-v0.1-q4f16_1`: Mistral-7B variants.
166-
- and many more at [binary-mlc-llm-libs](https://github.com/mlc-ai/binary-mlc-llm-libs).
167-
168-
## Use WebLLM Package
169-
170-
You can directly use WebLLM in your package via npm. Checkout instructions
171-
in the following project
192+
model library). For examples on how a model library can be shared by different model variants,
193+
see `prebuiltAppConfig`.
172194

173-
- [get-started](examples/get-started): minimum get started example.
174-
- [web-worker](examples/web-worker): get started with web worker backed chat.
175-
- [simple-chat](examples/simple-chat): a mininum and complete chat app.
176195

177196
## Build WebLLM Package From Source
178197

examples/README.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,28 @@ Please send a pull request if you find things that belongs to here.
55

66
## Tutorial Examples
77

8-
- [get-started](get-started): minimum get started example.
9-
- [web-worker](web-worker): get started with web worker backed chat.
10-
- [simple-chat](simple-chat): a mininum and complete chat app.
8+
Note that all examples below run in-browser and use WebGPU as a backend.
9+
10+
#### Basic Chat Completion
11+
- [get-started](get-started): minimum get started example with chat completion.
12+
- [get-started-web-worker](get-started-web-worker): same as get-started, but using web worker.
13+
- [multi-round-chat](multi-round-chat): while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache
14+
- [simple-chat](simple-chat): a mininum and complete chat bot app.
15+
- [next-simple-chat](next-simple-chat): a mininum and complete chat bot app with [Next.js](https://nextjs.org/).
16+
17+
#### Advanced OpenAI API Capabilities
18+
These examples demonstrate various capabilities via WebLLM's OpenAI-like API.
19+
- [streaming](streaming): return output as chunks in real-time in the form of an AsyncGenerator
20+
- [json-mode](json-mode): efficiently ensure output is in json format, see [OpenAI Reference](https://platform.openai.com/docs/guides/text-generation/chat-completions-api) for more.
21+
- [function-calling](function-calling): function calling with fields `tools` and `tool_choice`.
22+
- [seed-to-reproduce](seed-to-reproduce): use seeding to ensure reproducible output with fields `seed`.
23+
24+
#### Chrome Extension
25+
- [chrome-extension](chrome-extension): chrome extension that does not have a persistent background
26+
- [chrome-extension-webgpu-service-worker](chrome-extension-webgpu-service-worker): chrome extension using service worker, hence having a persistent background
27+
28+
#### Others
29+
- [logit-processor](logit-processor): while `logit_bias` is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API `forwardTokensAndSample()`.
1130

1231
## Demo Spaces
1332

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,54 @@
1-
import {ChatRestModule, ChatInterface, ChatModule, InitProgressReport} from "@mlc-ai/web-llm";
1+
import { EngineInterface, CreateEngine, InitProgressReport, ChatCompletionMessageParam } from "@mlc-ai/web-llm";
22

3-
// TODO: Surface this as an option to the user
4-
const useWebGPU = true;
5-
var model_loaded = false;
3+
let model_loaded = false;
64

7-
var cm: ChatInterface;
8-
if (!useWebGPU) {
9-
cm = new ChatRestModule();
10-
} else {
11-
cm = new ChatModule();
12-
}
5+
let engine: EngineInterface;
6+
const chatHistory: ChatCompletionMessageParam[] = [];
137

14-
// Set reponse callback for chat module
15-
const generateProgressCallback = (_step: number, message: string) => {
16-
// send the answer back to the content script
17-
chrome.runtime.sendMessage({ answer: message });
18-
};
19-
20-
var context = "";
8+
let context = "";
219
chrome.runtime.onMessage.addListener(async function (request) {
2210
// check if the request contains a message that the user sent a new message
2311
if (request.input) {
24-
var inp = request.input;
12+
let inp = request.input;
2513
if (context.length > 0) {
26-
inp = "Use only the following context when answering the question at the end. Don't use any other knowledge.\n"+ context + "\n\nQuestion: " + request.input + "\n\nHelpful Answer: ";
14+
inp = "Use only the following context when answering the question at the end. Don't use any other knowledge.\n" + context + "\n\nQuestion: " + request.input + "\n\nHelpful Answer: ";
2715
}
2816
console.log("Input:", inp);
29-
const response = await cm.generate(inp, generateProgressCallback);
17+
chatHistory.push({ "role": "user", "content": inp });
18+
19+
let curMessage = "";
20+
const completion = await engine.chat.completions.create({ stream: true, messages: chatHistory });
21+
for await (const chunk of completion) {
22+
const curDelta = chunk.choices[0].delta.content;
23+
if (curDelta) {
24+
curMessage += curDelta;
25+
}
26+
chrome.runtime.sendMessage({ answer: curMessage });
27+
}
28+
chatHistory.push({ "role": "assistant", "content": await engine.getMessage() });
3029
}
3130
if (request.context) {
3231
context = request.context;
3332
console.log("Got context:", context);
3433
}
3534
if (request.reload) {
3635
if (!model_loaded) {
37-
var appConfig = request.reload;
36+
const appConfig = request.reload;
3837
console.log("Got appConfig: ", appConfig);
39-
40-
cm.setInitProgressCallback((report: InitProgressReport) => {
38+
const initProgressCallback = (report: InitProgressReport) => {
4139
console.log(report.text, report.progress);
42-
chrome.runtime.sendMessage({ initProgressReport: report.progress});
43-
});
44-
45-
await cm.reload("Mistral-7B-Instruct-v0.2-q4f16_1", undefined, appConfig);
40+
chrome.runtime.sendMessage({ initProgressReport: report.progress });
41+
}
42+
// const selectedModel = "TinyLlama-1.1B-Chat-v0.4-q4f16_1-1k";
43+
const selectedModel = "Mistral-7B-Instruct-v0.2-q4f16_1";
44+
engine = await CreateEngine(
45+
selectedModel,
46+
{ appConfig: appConfig, initProgressCallback: initProgressCallback }
47+
);
4648
console.log("Model loaded");
4749
model_loaded = true;
4850
} else {
49-
chrome.runtime.sendMessage({ initProgressReport: 1.0});
51+
chrome.runtime.sendMessage({ initProgressReport: 1.0 });
5052
}
5153
}
5254
});

examples/chrome-extension-webgpu-service-worker/src/popup.ts

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
import './popup.css';
77

8-
import { ChatModule, AppConfig, InitProgressReport } from "@mlc-ai/web-llm";
8+
import { AppConfig } from "@mlc-ai/web-llm";
99
import { prebuiltAppConfig } from '@mlc-ai/web-llm';
1010
import { ProgressBar, Line } from "progressbar.js";
1111

@@ -17,16 +17,12 @@ const queryInput = document.getElementById("query-input")!;
1717
const submitButton = document.getElementById("submit-button")!;
1818

1919

20-
var isLoadingParams = false;
21-
22-
const generateProgressCallback = (_step: number, message: string) => {
23-
updateAnswer(message);
24-
};
20+
let isLoadingParams = false;
2521

2622

2723
(<HTMLButtonElement>submitButton).disabled = true;
2824

29-
var progressBar: ProgressBar = new Line('#loadingContainer', {
25+
const progressBar: ProgressBar = new Line('#loadingContainer', {
3026
strokeWidth: 4,
3127
easing: 'easeInOut',
3228
duration: 1400,

0 commit comments

Comments
 (0)