You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-3
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ Forget expensive NVIDIA GPUs, unify your existing devices into one powerful GPU:
27
27
<divalign="center">
28
28
<h2>Update: Exo Supports Llama 3.1</h2>
29
29
<p>Now the default models, run 8B, 70B and 405B parameter models on your own devices</p>
30
-
<p><ahref="https://github.com/exo-explore/exo/blob/main/exo/inference/mlx/models/sharded_llama.py">See the code</a></p>
30
+
<p><ahref="https://github.com/exo-explore/exo/blob/main/exo/inference/mlx/models/llama.py">See the code</a></p>
31
31
</div>
32
32
33
33
## Get Involved
@@ -40,7 +40,7 @@ We also welcome contributions from the community. We have a list of bounties in
40
40
41
41
### Wide Model Support
42
42
43
-
exo supports LLaMA ([MLX](exo/inference/mlx/models/sharded_llama.py) and [tinygrad](exo/inference/tinygrad/models/llama.py)) and other popular models.
43
+
exo supports LLaMA ([MLX](exo/inference/mlx/models/llama.py) and [tinygrad](exo/inference/tinygrad/models/llama.py)) and other popular models.
44
44
45
45
### Dynamic Model Partitioning
46
46
@@ -111,7 +111,7 @@ The native way to access models running on exo is using the exo library with pee
111
111
112
112
exo starts a ChatGPT-like WebUI (powered by [tinygrad tinychat](https://github.com/tinygrad/tinygrad/tree/master/examples/tinychat)) on http://localhost:8000
113
113
114
-
For developers, exo also starts a ChatGPT-compatible API endpoint on http://localhost:8000/v1/chat/completions. Example with curl:
114
+
For developers, exo also starts a ChatGPT-compatible API endpoint on http://localhost:8000/v1/chat/completions. Example with curls:
# note: we only support one image at a time right now. Multiple is possible. See: https://github.com/huggingface/transformers/blob/e68ec18ce224af879f22d904c7505a765fb77de3/docs/source/en/model_doc/llava.md?plain=1#L41
194
+
# follows the convention in https://platform.openai.com/docs/guides/vision
0 commit comments