Skip to content

Commit

Permalink
Process readme (pytorch#665)
Browse files Browse the repository at this point in the history
* executable README

* fix title of CI workflow

* markup commands in markdown

* extend the markup-markdown language

* Automatically identify cuda from nvidia-smi in install-requirements (pytorch#606)

* Automatically identify cuda from nvidia-smi in install-requirements

* Update README.md

---------

Co-authored-by: Michael Gschwind <[email protected]>

* Unbreak zero-temperature sampling (pytorch#599)

Fixes pytorch#581.

* Improve process README

* [retake] Add sentencepiece tokenizer (pytorch#626)

* Add sentencepiece tokenizer

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Add white space

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle white space:

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Handle control ids

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* More cleanup

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use unique_ptr

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Use a larger runner

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Debug

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Cleanup

* Update install_utils.sh to use python3 instead of python (pytorch#636)

As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them.

* Fix quantization doc to specify dytpe limitation on a8w4dq (pytorch#629)

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Kimish Patel <[email protected]>

* add desktop.json (pytorch#622)

* add desktop.json

* add fast

* remove embedding

* improvements

* update readme from doc branch

* tab/spc

* fix errors in updown language

* fix errors in updown language, and [skip]: begin/end

* fix errors in updown language, and [skip]: begin/end

* a storied run

* stories run on readme instructions does not need HF token

* increase timeout

* check for hang un hf_login

* executable README improvements

* typo

* typo

---------

Co-authored-by: Ian Barber <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: Kimish Patel <[email protected]>
Co-authored-by: Scott Roy <[email protected]>
  • Loading branch information
6 people authored and malfet committed Jul 17, 2024
1 parent 824a7ab commit 86c50df
Show file tree
Hide file tree
Showing 6 changed files with 24 additions and 14 deletions.
1 change: 1 addition & 0 deletions .github/workflows/run-readme2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ jobs:
echo "::group::Create script"
python3 scripts/process-readme.py > ./readme-commands.sh
echo "exit 1" >> ./readme-commands.sh
echo "::endgroup::"
echo "::group::Run This"
Expand Down
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,18 @@ HuggingFace.
python3 torchchat.py download llama3
```

*NOTE: This command may prompt you to request access to llama3 via HuggingFace, if you do not already have access. Simply follow the prompts and re-run the command when access is granted.*
*NOTE: This command may prompt you to request access to llama3 via
HuggingFace, if you do not already have access. Simply follow the
prompts and re-run the command when access is granted.*

View available models with:
```
python3 torchchat.py list
```

You can also remove downloaded models with the remove command:
`python3 torchchat.py remove llama3`

You can also remove downloaded models with the remove command: `python3 torchchat.py remove llama3`


## Running via PyTorch / Python
Expand All @@ -111,15 +114,15 @@ python3 torchchat.py generate llama3 --prompt "write me a story about a boy and

For more information run `python3 torchchat.py generate --help`

[end default]:

### Browser

[shell default]: if false; then
[skip default]: begin
```
python3 torchchat.py browser llama3
```
[shell default]: fi
[skip default]: end


*Running on http://127.0.0.1:5000* should be printed out on the
terminal. Click the link or go to
Expand All @@ -139,9 +142,15 @@ conversation.
AOT compiles models before execution for faster inference

The following example exports and executes the Llama3 8B Instruct
<<<<<<< HEAD
model. The first command performs the actual export, the second
command loads the exported model into the Python interface to enable
users to test the exported model.
=======
model. (The first command performs the actual export, the second
command loads the exported model into the Python interface to enable
users to test the exported model.)
>>>>>>> cf83f45a1949b3d45e356d375486a4013badf4db
```
# Compile
Expand All @@ -152,9 +161,10 @@ python3 torchchat.py export llama3 --output-dso-path exportedModels/llama3.so
python3 torchchat.py generate llama3 --dso-path exportedModels/llama3.so --prompt "Hello my name is"
```

NOTE: If you're machine has cuda add this flag for performance
NOTE: If your machine has cuda add this flag for performance
`--quantize config/data/cuda.json`

[end default]: end
### Running native using our C++ Runner

The end-to-end C++ [runner](runner/run.cpp) runs an `*.so` file
Expand All @@ -167,7 +177,7 @@ scripts/build_native.sh aoti

Execute
```bash
cmake-out/aoti_run exportedModels/llama3.so -z .model-artifacts/meta-llama/Meta-Llama-3-8B-Instruct/tokenizer.model -l 3 -i "Once upon a time"
cmake-out/aoti_run exportedModels/llama3.so -z ~/.torchchat/model-cache/meta-llama/Meta-Llama-3-8B-Instruct/tokenizer.model -l 3 -i "Once upon a time"
```

[end default]:
Expand Down Expand Up @@ -243,9 +253,9 @@ Now, follow the app's UI guidelines to pick the model and tokenizer files from t
<img src="https://pytorch.org/executorch/main/_static/img/llama_ios_app.png" width="600" alt="iOS app running a LlaMA model">
</a>


### Deploy and run on Android

MISSING. TBD.



Expand Down
1 change: 0 additions & 1 deletion build/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
import os
from pathlib import Path
from typing import Any, Callable, Dict, List, Tuple

import torch

##########################################################################
Expand Down
3 changes: 1 addition & 2 deletions config/data/desktop.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
{
"executor": {"accelerator": "fast" },
"executor": {"accelerator": "fast"},
"precision": {"dtype" : "fast16"},
"linear:int4": {"groupsize" : 256}
}
4 changes: 2 additions & 2 deletions docs/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Due to the larger vocabulary size of llama3, we also recommend quantizing the em
|--|--|--|--|--|--|--|--|
| embedding (symmetric) | fp32, fp16, bf16 | [8, 4]* | [32, 64, 128, 256]** | ||||

^ The a8w4dq quantization scheme requires inouts to be converted to fp32, due to lack of support for fp16 and bf16.
^a8w4dq quantization scheme requires model to be converted to fp32, due to lack of support for fp16 and bf16 in the kernels provided with ExecuTorch.

* These are the only valid bitwidth options.

Expand Down Expand Up @@ -82,7 +82,7 @@ python3 generate.py llama3 --dso-path llama3.dso --prompt "Hello my name is"
```
### ExecuTorch
```
python3 torchchat.py export llama3 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
python3 torchchat.py export llama3 --dtype fp32 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
python3 generate.py llama3 --pte-path llama3.pte --prompt "Hello my name is"
```
Expand Down
3 changes: 2 additions & 1 deletion scripts/process-readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ def print_between_triple_backticks(filename, predicate):
elif line.startswith(command):
print(line[len(command) :])
elif line.startswith(end):
print("exit 0")
return
elif line.startswith(skip):
keyword = line[len(skip):-1].strip()
Expand All @@ -34,6 +35,6 @@ def print_between_triple_backticks(filename, predicate):
if len(sys.argv) > 1:
predicate = sys.argv[1]
else:
predicate = "default"
predicate="default"

print_between_triple_backticks("README.md", predicate)

0 comments on commit 86c50df

Please sign in to comment.