Skip to content

Commit 9c47edc

Browse files
authored
README: Minor typos + Upsell Server (pytorch#1131)
1 parent bc3a365 commit 9c47edc

File tree

1 file changed

+15
-16
lines changed

1 file changed

+15
-16
lines changed

README.md

+15-16
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ torchchat is a small codebase showcasing the ability to run large language model
1010
- [Run chat in the Browser](#browser)
1111
- [Run models on desktop/server without python](#desktopserver-execution)
1212
- [Use AOT Inductor for faster execution](#aoti-aot-inductor)
13-
- [Running in c++ using the runner](#running-native-using-our-c-runner)
13+
- [Running in c++ using the runner](#run-using-our-c-runner)
1414
- [Run models on mobile](#mobile-execution)
1515
- [Deploy and run on iOS](#deploy-and-run-on-ios)
1616
- [Deploy and run on Android](#deploy-and-run-on-android)
@@ -33,7 +33,8 @@ torchchat is a small codebase showcasing the ability to run large language model
3333
## Installation
3434
The following steps require that you have [Python 3.10](https://www.python.org/downloads/release/python-3100/) installed.
3535

36-
*torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.*
36+
> [!TIP]
37+
> torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.
3738
3839
[skip default]: begin
3940
```bash
@@ -127,21 +128,21 @@ python3 torchchat.py download llama3.1
127128
<summary>Additional Model Inventory Management Commands</summary>
128129

129130
### List
130-
This subcommands shows the available models
131+
This subcommand shows the available models
131132
```bash
132133
python3 torchchat.py list
133134
```
134135

135136
### Where
136-
This subcommands shows location of a particular model.
137+
This subcommand shows location of a particular model.
137138
```bash
138139
python3 torchchat.py where llama3.1
139140
```
140141
This is useful in scripts when you do not want to hard-code paths
141142

142143

143144
### Remove
144-
This subcommands removes the specified model
145+
This subcommand removes the specified model
145146
```bash
146147
python3 torchchat.py remove llama3.1
147148
```
@@ -181,18 +182,10 @@ python3 torchchat.py generate llama3.1 --prompt "write me a story about a boy an
181182
[skip default]: end
182183

183184
### Server
184-
**Note: This feature is still a work in progress and not all endpoints are working**
185-
186-
187-
<details>
188-
<summary>This mode gives a REST API that matches the OpenAI API spec for interacting with a model</summary>
189-
185+
This mode exposes a REST API for interacting with a model.
190186
The server follows the [OpenAI API specification](https://platform.openai.com/docs/api-reference/chat) for chat completions.
191-
Since this feature is under active development, not every parameter is consumed. See api/api.py for details on
192-
which request parameters are implemented. If you encounter any issues, please comment on the [tracking Github issue](https://github.com/pytorch/torchchat/issues/973).
193187

194188
To test out the REST API, **you'll need 2 terminals**: one to host the server, and one to send the request.
195-
196189
In one terminal, start the server
197190

198191
[skip default]: begin
@@ -204,8 +197,14 @@ python3 torchchat.py server llama3.1
204197

205198
In another terminal, query the server using `curl`. Depending on the model configuration, this query might take a few minutes to respond.
206199

207-
Setting `stream` to "true" in the request emits a response in chunks. If `stream` is unset or not "true", then the client will await the full response from the server.
200+
> [!NOTE]
201+
> Since this feature is under active development, not every parameter is consumed. See api/api.py for details on
202+
> which request parameters are implemented. If you encounter any issues, please comment on the [tracking Github issue](https://github.com/pytorch/torchchat/issues/973).
208203
204+
<details>
205+
<summary>Example Query</summary>
206+
207+
Setting `stream` to "true" in the request emits a response in chunks. If `stream` is unset or not "true", then the client will await the full response from the server.
209208

210209
**Example Input + Output**
211210

@@ -348,7 +347,7 @@ Specifically there are 2 ways of doing so: Pure Python and via a Runner
348347

349348
```
350349
# Execute
351-
python3 torchchat.py generate llama3.1 --device cpu --pte-path llama3.1.pte --prompt "Hello my name is"
350+
python3 torchchat.py generate llama3.1 --pte-path llama3.1.pte --prompt "Hello my name is"
352351
```
353352

354353
</details>

0 commit comments

Comments
 (0)