check in tokenizer.model for ease of dev setup (#59)

it's a small thing and can be download from OSS, we can just check in [ghstack-poisoned]
pytorch · Aug 20, 2024 · 4119639 · 4119639
1 parent 08b607e
commit 4119639
Show file tree

Hide file tree

Showing 3 changed files with 3 additions and 6 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,5 +9,4 @@ outputs
 data
 out
 wandb
-*.model
 *.json
diff --git a/README.md b/README.md
@@ -6,17 +6,15 @@ torchtrain contains PyTorch native parallelisms, tools and utilities to train la
 
 # Installation
 
-install PyTorch from source or install the latest pytorch nightly, then install requirements by
+Install PyTorch from source or install the latest pytorch nightly, then install requirements by
 
 ```python
 pip install -r requirements.txt
 ```
 
-download tokenizer from HF
-This part is needed first time if there's no tokenizer locally by run:
-
+Install additional dev requirements if you want to contribute to the repo:
 ```
-python torchtrain/datasets/download_tokenizer.py --hf_token your_token
+pip install -r dev-requirements.txt
 ```
 
 run the llama debug model locally to verify the setup is correct:

diff --git a/torchtrain/datasets/tokenizer/tokenizer.model b/torchtrain/datasets/tokenizer/tokenizer.model
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,5 +9,4 @@ outputs @@
     data
     out
     wandb
-    *.model
     *.json