Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs on how to interact with local LLM #1128

Merged
merged 22 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a09fb16
Update docstring for oai.completion.
LeoLjl Jul 10, 2023
daa65b7
Merge branch 'main' into main
LeoLjl Jul 10, 2023
641da59
Merge branch 'microsoft:main' into main
LeoLjl Jul 10, 2023
34a973b
Merge branch 'microsoft:main' into main
LeoLjl Jul 13, 2023
a9f907d
Update docs about how to interact with local LLMs
LeoLjl Jul 14, 2023
1a6fe3f
Update docs about how to interact with local LLMs
LeoLjl Jul 14, 2023
effa544
Merge branch 'main' of https://github.com/LeoLjl/FLAML into main
LeoLjl Jul 14, 2023
79b3cdb
Merge branch 'microsoft:main' into main
LeoLjl Jul 14, 2023
f4bab94
Reformat file.
LeoLjl Jul 14, 2023
27df6ba
Merge branch 'main' of https://github.com/LeoLjl/FLAML into main
LeoLjl Jul 14, 2023
174c931
Fix issues.
LeoLjl Jul 15, 2023
3fdc09b
Update website/blog/2023-07-14-Local-LLMs/index.mdx
LeoLjl Jul 15, 2023
06e1ef5
Update website/blog/2023-07-14-Local-LLMs/index.mdx
LeoLjl Jul 15, 2023
6335790
Update website/docs/Use-Cases/Auto-Generation.md
LeoLjl Jul 15, 2023
a9e18a9
Add documents about multiple workers.
LeoLjl Jul 15, 2023
623d760
Merge branch 'main' of https://github.com/LeoLjl/FLAML
LeoLjl Jul 15, 2023
80b8459
Update user instructions.
LeoLjl Jul 16, 2023
3edc33d
Merge branch 'main' into main
LeoLjl Jul 17, 2023
98bd46d
Label big fix as optional
LeoLjl Jul 18, 2023
4d890bf
Merge branch 'main' of https://github.com/LeoLjl/FLAML into main
LeoLjl Jul 18, 2023
081e5ef
Merge branch 'main' into main
LeoLjl Jul 18, 2023
ea0f125
Update website/blog/2023-07-14-Local-LLMs/index.mdx
LeoLjl Jul 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions website/blog/2023-07-14-Local-LLMs/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
title: Use flaml.autogen for local LLMs
authors: jialeliu
tags: [LLM, FLAMLv2]
---
**TL;DR:**
We demonstrate how to use flaml.autogen for local LLM application. As an example, we will initiate an endpoint using [FastChat](https://github.com/lm-sys/FastChat) and perform inference on [ChatGLMv2-6b](https://github.com/THUDM/ChatGLM2-6B).

## Preparations

### Clone FastChat

FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. However, its code needs minor modification in order to function properly.

```bash
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
```

### Download checkpoint

ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. ChatGLM2-6B is its second-generation version.

Before downloading from HuggingFace Hub, you need to have Git LFS [installed](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage).

```bash
git clone https://huggingface.co/THUDM/chatglm2-6b
```

## Initiate server
sonichi marked this conversation as resolved.
Show resolved Hide resolved

First, launch the controller

```bash
python -m fastchat.serve.controller
```

Then, launch the model worker(s)

```bash
python -m fastchat.serve.model_worker --model-path chatglm2-6b
```

Finally, launch the RESTful API server

```bash
python -m fastchat.serve.openai_api_server --host localhost --port 8000
```

Normally this will work. However, if you encounter error like [this](https://github.com/lm-sys/FastChat/issues/1641), commenting out all the lines containing `finish_reason` in `fastchat/protocol/api_protocal.py` and `fastchat/protocol/openai_api_protocol.py` will fix the problem. The modified code looks like:

```python
class CompletionResponseChoice(BaseModel):
index: int
text: str
logprobs: Optional[int] = None
# finish_reason: Optional[Literal["stop", "length"]]

class CompletionResponseStreamChoice(BaseModel):
index: int
text: str
logprobs: Optional[float] = None
# finish_reason: Optional[Literal["stop", "length"]] = None
```


## Interact with model using `oai.Completion`

Now the models can be directly accessed through openai-python library as well as `flaml.oai.Completion` and `flaml.oai.ChatCompletion`.


```python
from flaml import oai

# create a text completion request
response = oai.Completion.create(
config_list=[
{
"model": "chatglm2-6b",
"api_base": "http://localhost:8000/v1",
sonichi marked this conversation as resolved.
Show resolved Hide resolved
"api_type": "open_ai",
"api_key": "NULL", # just a placeholder
}
],
prompt="Hi",
)
print(response)

# create a chat completion request
response = oai.ChatCompletion.create(
config_list=[
{
"model": "chatglm2-6b",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
}
],
messages=[{"role": "user", "content": "Hi"}]
)
print(response)
```

If you would like to switch to different models, download their checkpoints and specify model path when launching model worker(s).

## interacting with multiple local LLMs

If you would like to interact with multiple LLMs on your local machine, replace the `model_worker` step above with a multi model variant:

```bash
python -m fastchat.serve.multi_model_worker \
--model-path lmsys/vicuna-7b-v1.3 \
--model-names vicuna-7b-v1.3 \
--model-path chatglm2-6b \
--model-names chatglm2-6b
```

The inference code would be:

```python
from flaml import oai

# create a chat completion request
response = oai.ChatCompletion.create(
config_list=[
{
"model": "chatglm2-6b",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
},
{
"model": "vicuna-7b-v1.3",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
}
],
messages=[{"role": "user", "content": "Hi"}]
)
print(response)
```

## For Further Reading

* [Documentation](/docs/Use-Cases/Auto-Generation) about `flaml.autogen`
* [Documentation](https://github.com/lm-sys/FastChat) about FastChat.
6 changes: 6 additions & 0 deletions website/blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,9 @@ qingyunwu:
title: Assistant Professor at the Pennsylvania State University
url: https://qingyun-wu.github.io/
image_url: https://github.com/qingyun-wu.png

jialeliu:
name: Jiale Liu
title: Undergraduate student at Xidian University
url: https://leoljl.github.io
image_url: https://github.com/LeoLjl/leoljl.github.io/blob/main/profile.jpg?raw=true
2 changes: 1 addition & 1 deletion website/docs/Use-Cases/Auto-Generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ The tuend config can be used to perform inference.
`flaml.oai.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API.
When chat models are used and `prompt` is given as the input to `flaml.oai.Completion.create`, the prompt will be automatically converted into `messages` to fit the chat completion API requirement. One advantage is that one can experiment with both chat and non-chat models for the same prompt in a unified API.

For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI), and then use the same API to send a request.
For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI) and [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](../../blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.

When only working with the chat-based models, `flaml.oai.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages.

Expand Down
Loading