Skip to content

package exo as installable #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from Nov 19, 2024
Merged

package exo as installable #470

merged 25 commits into from Nov 19, 2024

Conversation

ghost
Copy link

@ghost ghost commented Nov 18, 2024

@AlexCheema
Copy link
Contributor

LGTM

@AlexCheema AlexCheema merged commit 0501efa into exo-explore:main Nov 19, 2024
@OKHand-Zy
Copy link

OKHand-Zy commented Nov 20, 2024

I've pulled the latest main branch 1fa42f3 and the code runs, but I'm getting a "not defined" in exit exo.
the exo can run but have this message.

❯ exo  --inference-engine mlx --run-model llama-3.2-3b
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Selected inference engine: mlx

  _____  _____  
 / _ \ \/ / _ \ 
|  __/>  < (_) |
 \___/_/\_\___/ 
    
Detected system: Apple Silicon Mac
Inference engine name after selection: mlx
Using inference engine: MLXDynamicShardInferenceEngine with shard downloader: HFShardDownloader
[59392, 62317, 62890, 61755, 51505, 54822, 59529, 58544, 58825, 58707, 54319, 59382, 57740, 55399, 62061, 56510, 61677, 54465, 58521]
Chat interface started:
 - http://172.20.10.8:52415
 - http://127.0.0.1:52415
ChatGPT API endpoint served at:
 - http://172.20.10.8:52415/v1/chat/completions
 - http://127.0.0.1:52415/v1/chat/completions
has_read=True, has_write=True
Processing prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Removing download task for Shard(model_id='llama-3.2-3b', start_layer=0, end_layer=27, n_layers=28): True

Generated response:
I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."<|eot_id|>
Received exit signal SIGTERM...
Thank you for using exo.

  _____  _____  
 / _ \ \/ / _ \ 
|  __/>  < (_) |
 \___/_/\_\___/ 
    
Cancelling 4 outstanding tasks
Traceback (most recent call last):
  File "/Users/ziyu/miniconda3/envs/exo/bin/exo", line 33, in <module>
    sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
  File "/Users/ziyu/RemoteFolder/ziyu-pr/exo/exo/main.py", line 247, in run
    loop.run_until_complete(shutdown(signal.SIGTERM, loop))
  File "/Users/ziyu/miniconda3/envs/exo/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/ziyu/RemoteFolder/ziyu-pr/exo/exo/helpers.py", line 249, in shutdown
    await server.stop()
NameError: name 'server' is not defined
╭────────────────────────────────────────────────────────────────────── Exo Cluster (1 node) ──────────────────────────────────────────────────────────────────────╮
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ 💬️ Who are you?                                                                                                                                                  │
│                                                                                                                                                                  │
│ 🤖 I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."<|eot_id|>                                               │
.....

@ghost
Copy link
Author

ghost commented Nov 21, 2024

this is addressed on pr #473

@OKHand-Zy
Copy link

this is addressed on pr #473

Hi, I'd like to know if the change made on line 186 of exo/inference/mlx/sharded_utils.py is crucial.

tokenizer = load_tokenizer(model_path, tokenizer_config) => tokenizer = await resolve_tokenizer(model_path)

When I try to support local models, I encounter an issue, but it's resolved after reverting the change. So I'm wondering if this change is really necessary. If it's not that important, I'd prefer to use the old method.

@ghost
Copy link
Author

ghost commented Nov 21, 2024

change was made by @dtnewman , for now I will revert it back to the old method

@ghost ghost mentioned this pull request Nov 21, 2024
@AlexCheema
Copy link
Contributor

This change is correct. We should keep using resolve_tokenizer as it's async. We should not have sync blocking I/O code.

@OKHand-Zy you will need to fix your code to work with resolve_tokenizer

@OKHand-Zy
Copy link

OKHand-Zy commented Nov 21, 2024

This change is correct. We should keep using resolve_tokenizer as it's async. We should not have sync blocking I/O code.

@OKHand-Zy you will need to fix your code to work with resolve_tokenizer

Okay, I'll try to modify my code to make resolve_tokenizer work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants