Update ckpt loading#38
Conversation
|
@schoi-habana , please also test downloaded ckpt for falcon-180b |
|
@schoi-habana - WIll this solve llama2 ; right now without your change we see below error: pickle.UnpicklingError: Memo value not found at index 107 |
|
Please tell some 70b command if it works: |
PR#15 reads a set of ckpt file names from the index json file. When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names. This PR scans the path and returns a list of names that matches the pattern
4f4b772 to
f32ae33
Compare
|
@puneeshkhanna it may be related to DeepSpeed version, too. I'm aware of that error but couldn't repro in my test env. Which DS version is used in QA? |
|
@schoi-habana - I m using deepspeed-fork latest master. Below are the contents of the checkpoints: |
|
@puneeshkhanna DeepSpeed-fork git-hash=f4fa754c is not latest, there is a change you need to load safetensors. |
* enable loading falcon-180b ckpt in .safetensors format * Address comments borrowing transformer's way of reading ckpt file * address comments * Update ckpt loading PR#15 reads a set of ckpt file names from the index json file. When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names. This PR scans the path and returns a list of names that matches the pattern * import modeling_utils from transformers
* enable loading falcon-180b ckpt in .safetensors format * Address comments borrowing transformer's way of reading ckpt file * address comments * Update ckpt loading PR#15 reads a set of ckpt file names from the index json file. When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names. This PR scans the path and returns a list of names that matches the pattern * import modeling_utils from transformers
PR#15 reads a set of ckpt file names from the index json file.
When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names.
This PR scans the path and returns a list of names that matches the pattern
Tested the models in cache dir:
Falcon-7b, Falcon-40b, Falcon-180b, Llama-70b
Tested the models downloaded from HF hub:
Falcon-7b, Falcon-40b, TinyLlama
Before submitting