Skip to content

Commit

Permalink
check in tokenizer.model for ease of dev setup (#59)
Browse files Browse the repository at this point in the history
it's a small thing and can be download from OSS, we can just check in

[ghstack-poisoned]
  • Loading branch information
H-Huang committed Aug 20, 2024
1 parent 08b607e commit 4119639
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 6 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,4 @@ outputs
data
out
wandb
*.model
*.json
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,15 @@ torchtrain contains PyTorch native parallelisms, tools and utilities to train la

# Installation

install PyTorch from source or install the latest pytorch nightly, then install requirements by
Install PyTorch from source or install the latest pytorch nightly, then install requirements by

```python
pip install -r requirements.txt
```

download tokenizer from HF
This part is needed first time if there's no tokenizer locally by run:

Install additional dev requirements if you want to contribute to the repo:
```
python torchtrain/datasets/download_tokenizer.py --hf_token your_token
pip install -r dev-requirements.txt
```

run the llama debug model locally to verify the setup is correct:
Expand Down
Binary file added torchtrain/datasets/tokenizer/tokenizer.model
Binary file not shown.

0 comments on commit 4119639

Please sign in to comment.