You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-16
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ torchchat is a small codebase showcasing the ability to run large language model
10
10
-[Run chat in the Browser](#browser)
11
11
-[Run models on desktop/server without python](#desktopserver-execution)
12
12
-[Use AOT Inductor for faster execution](#aoti-aot-inductor)
13
-
-[Running in c++ using the runner](#running-native-using-our-c-runner)
13
+
-[Running in c++ using the runner](#run-using-our-c-runner)
14
14
-[Run models on mobile](#mobile-execution)
15
15
-[Deploy and run on iOS](#deploy-and-run-on-ios)
16
16
-[Deploy and run on Android](#deploy-and-run-on-android)
@@ -33,7 +33,8 @@ torchchat is a small codebase showcasing the ability to run large language model
33
33
## Installation
34
34
The following steps require that you have [Python 3.10](https://www.python.org/downloads/release/python-3100/) installed.
35
35
36
-
*torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.*
36
+
> [!TIP]
37
+
> torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.
<summary>Additional Model Inventory Management Commands</summary>
128
129
129
130
### List
130
-
This subcommands shows the available models
131
+
This subcommand shows the available models
131
132
```bash
132
133
python3 torchchat.py list
133
134
```
134
135
135
136
### Where
136
-
This subcommands shows location of a particular model.
137
+
This subcommand shows location of a particular model.
137
138
```bash
138
139
python3 torchchat.py where llama3.1
139
140
```
140
141
This is useful in scripts when you do not want to hard-code paths
141
142
142
143
143
144
### Remove
144
-
This subcommands removes the specified model
145
+
This subcommand removes the specified model
145
146
```bash
146
147
python3 torchchat.py remove llama3.1
147
148
```
@@ -181,18 +182,10 @@ python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy an
181
182
[skip default]: end
182
183
183
184
### Server
184
-
**Note: This feature is still a work in progress and not all endpoints are working**
185
-
186
-
187
-
<details>
188
-
<summary>This mode gives a REST API that matches the OpenAI API spec for interacting with a model</summary>
189
-
185
+
This mode exposes a REST API for interacting with a model.
190
186
The server follows the [OpenAI API specification](https://platform.openai.com/docs/api-reference/chat) for chat completions.
191
-
Since this feature is under active development, not every parameter is consumed. See api/api.py for details on
192
-
which request parameters are implemented. If you encounter any issues, please comment on the [tracking Github issue](https://github.com/pytorch/torchchat/issues/973).
193
187
194
188
To test out the REST API, **you'll need 2 terminals**: one to host the server, and one to send the request.
195
-
196
189
In one terminal, start the server
197
190
198
191
[skip default]: begin
@@ -204,8 +197,14 @@ python3 torchchat.py server llama3.1
204
197
205
198
In another terminal, query the server using `curl`. Depending on the model configuration, this query might take a few minutes to respond.
206
199
207
-
Setting `stream` to "true" in the request emits a response in chunks. If `stream` is unset or not "true", then the client will await the full response from the server.
200
+
> [!NOTE]
201
+
> Since this feature is under active development, not every parameter is consumed. See api/api.py for details on
202
+
> which request parameters are implemented. If you encounter any issues, please comment on the [tracking Github issue](https://github.com/pytorch/torchchat/issues/973).
208
203
204
+
<details>
205
+
<summary>Example Query</summary>
206
+
207
+
Setting `stream` to "true" in the request emits a response in chunks. If `stream` is unset or not "true", then the client will await the full response from the server.
209
208
210
209
**Example Input + Output**
211
210
@@ -348,7 +347,7 @@ Specifically there are 2 ways of doing so: Pure Python and via a Runner
348
347
349
348
```
350
349
# Execute
351
-
python3 torchchat.py generate llama3.1 --device cpu --pte-path llama3.1.pte --prompt "Hello my name is"
350
+
python3 torchchat.py generate llama3.1 --pte-path llama3.1.pte --prompt "Hello my name is"
0 commit comments