Skip to content

Update ckpt loading#38

Merged
mandy-li merged 6 commits into
habana-mainfrom
schoi/falcon_180b
Feb 14, 2024
Merged

Update ckpt loading#38
mandy-li merged 6 commits into
habana-mainfrom
schoi/falcon_180b

Conversation

@schoi-habana
Copy link
Copy Markdown

PR#15 reads a set of ckpt file names from the index json file.
When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names.
This PR scans the path and returns a list of names that matches the pattern

Tested the models in cache dir:
Falcon-7b, Falcon-40b, Falcon-180b, Llama-70b

Tested the models downloaded from HF hub:
Falcon-7b, Falcon-40b, TinyLlama

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@schoi-habana schoi-habana marked this pull request as ready for review February 14, 2024 01:18
@mandy-li
Copy link
Copy Markdown

@schoi-habana , please also test downloaded ckpt for falcon-180b

@puneeshkhanna
Copy link
Copy Markdown

@schoi-habana - WIll this solve llama2 ; right now without your change we see below error:

pickle.UnpicklingError: Memo value not found at index 107
Loading 2 checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading 2 checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 626, in
main()
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 277, in main
model, tokenizer, generation_config = initialize_model(args, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 374, in initialize_model
else setup_distributed_model(args, model_dtype, model_kwargs, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 235, in setup_distributed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 154, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 333, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 571, in replace_module
sd = torch.load(checkpoint, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1026, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1244, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: Memo value not found at index 107
[2024-02-14 06:40:09,891] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.4+f4fa754c, git-hash=f4fa754c, git-branch=HEAD
[2024-02-14 06:40:09,892] [INFO] [logging.py:96:log_dist] [Rank 0] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Loading 2 checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 626, in
main()
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 277, in main
model, tokenizer, generation_config = initialize_model(args, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 374, in initialize_model
else setup_distributed_model(args, model_dtype, model_kwargs, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 235, in setup_distributed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 154, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 333, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 571, in replace_module
sd = torch.load(checkpoint, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1026, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1244, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'x'.

@puneeshkhanna
Copy link
Copy Markdown

Please tell some 70b command if it works:
python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py --model_name_or_path /software/
data/llama_inference/Llama-2-70b-hf/ --use_hpu_graphs --use_kv_cache --max_input_tokens 128 --max_new_tokens 1024 --batch_size 16 --attn_softmax_bf16 --trim_logits --bf16 --reuse_cache --warmup 2 --n_iterations 2 --limit_hpu_graphs

PR#15 reads a set of ckpt file names from the index json file.
When OH downloads files from the hub instead of loading from a cache dir, get_repo_root()
skips downloading the index json file. Thus the PR#15 fails to load file names.
This PR scans the path and returns a list of names that matches the pattern
@schoi-habana
Copy link
Copy Markdown
Author

@puneeshkhanna it may be related to DeepSpeed version, too. I'm aware of that error but couldn't repro in my test env. Which DS version is used in QA?

@puneeshkhanna
Copy link
Copy Markdown

puneeshkhanna commented Feb 14, 2024

@schoi-habana - I m using deepspeed-fork latest master.
DeepSpeed info: version=0.12.4+f4fa754c, git-hash=f4fa754c

Below are the contents of the checkpoints:
ls -al /mnt/weka/data/pytorch/llama2/Llama-2-70b-hf/
total 269444208
drwxrwxr-x 1 2002 2002 0 Oct 18 12:39 .
drwxrwxr-x 1 2002 2002 0 Oct 18 12:39 ..
-rw-rw-r-- 1 2002 2002 609 Oct 18 13:05 config.json
-rw-rw-r-- 1 2002 2002 167 Oct 18 13:24 generation_config.json
-rw-rw-r-- 1 2002 2002 7020 Oct 18 13:09 LICENSE.txt
-rw-rw-r-- 1 2002 2002 9852591960 Oct 18 13:14 model-00001-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099016 Oct 18 13:11 model-00002-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9965870512 Oct 18 13:24 model-00003-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798066064 Oct 18 13:09 model-00004-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:28 model-00005-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:17 model-00006-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9965870512 Oct 18 13:41 model-00007-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798066064 Oct 18 13:27 model-00008-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:30 model-00009-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:22 model-00010-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9965870512 Oct 18 13:20 model-00011-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798066064 Oct 18 13:43 model-00012-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:37 model-00013-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9496124816 Oct 18 13:50 model-00014-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 524288128 Oct 18 13:25 model-00015-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 66725 Oct 18 13:24 model.safetensors.index.json
-rw-rw-r-- 1 2002 2002 9852605685 Oct 18 13:34 pytorch_model-00001-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113337 Oct 18 13:13 pytorch_model-00002-of-00015.bin
-rw-rw-r-- 1 2002 2002 9965883861 Oct 18 13:19 pytorch_model-00003-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798079785 Oct 18 13:10 pytorch_model-00004-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113313 Oct 18 13:18 pytorch_model-00005-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113337 Oct 18 13:08 pytorch_model-00006-of-00015.bin
-rw-rw-r-- 1 2002 2002 9965883861 Oct 18 13:13 pytorch_model-00007-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798079785 Oct 18 13:37 pytorch_model-00008-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113313 Oct 18 13:10 pytorch_model-00009-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113337 Oct 18 13:21 pytorch_model-00010-of-00015.bin
-rw-rw-r-- 1 2002 2002 9965883861 Oct 18 13:23 pytorch_model-00011-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798079785 Oct 18 13:26 pytorch_model-00012-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113313 Oct 18 13:19 pytorch_model-00013-of-00015.bin
-rw-rw-r-- 1 2002 2002 9496138113 Oct 18 13:18 pytorch_model-00014-of-00015.bin
-rw-rw-r-- 1 2002 2002 524288938 Oct 18 13:21 pytorch_model-00015-of-00015.bin
-rw-rw-r-- 1 2002 2002 66725 Oct 18 13:24 pytorch_model.bin.index.json
-rw-rw-r-- 1 2002 2002 10372 Oct 18 13:23 README.md
-rw-rw-r-- 1 2002 2002 1253223 Oct 18 13:09 Responsible-Use-Guide.pdf
-rw-rw-r-- 1 2002 2002 414 Oct 18 13:41 special_tokens_map.json
-rw-rw-r-- 1 2002 2002 776 Oct 18 13:27 tokenizer_config.json
-rw-rw-r-- 1 2002 2002 1842767 Oct 18 13:34 tokenizer.json
-rw-rw-r-- 1 2002 2002 499723 Oct 18 13:18 tokenizer.model
-rw-rw-r-- 1 2002 2002 4766 Oct 18 13:05 USE_POLICY.md

Comment thread optimum/habana/checkpoint_utils.py Outdated
@schoi-habana
Copy link
Copy Markdown
Author

@puneeshkhanna DeepSpeed-fork git-hash=f4fa754c is not latest, there is a change you need to load safetensors.
Anyway this PR will load bins so no need for you to checkout later changes in DS-fork. Can you test this PR in your env?

@mandy-li mandy-li self-requested a review February 14, 2024 19:58
Copy link
Copy Markdown

@mandy-li mandy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mandy-li mandy-li merged commit 453b14a into habana-main Feb 14, 2024
bhargaveede pushed a commit that referenced this pull request Feb 19, 2024
* enable loading falcon-180b ckpt in .safetensors format

* Address comments borrowing transformer's way of reading ckpt file

* address comments

* Update ckpt loading

PR#15 reads a set of ckpt file names from the index json file.
When OH downloads files from the hub instead of loading from a cache dir, get_repo_root()
skips downloading the index json file. Thus the PR#15 fails to load file names.
This PR scans the path and returns a list of names that matches the pattern

* import modeling_utils from transformers
bhargaveede pushed a commit that referenced this pull request Feb 19, 2024
* enable loading falcon-180b ckpt in .safetensors format

* Address comments borrowing transformer's way of reading ckpt file

* address comments

* Update ckpt loading

PR#15 reads a set of ckpt file names from the index json file.
When OH downloads files from the hub instead of loading from a cache dir, get_repo_root()
skips downloading the index json file. Thus the PR#15 fails to load file names.
This PR scans the path and returns a list of names that matches the pattern

* import modeling_utils from transformers
@astachowiczhabana
Copy link
Copy Markdown

huggingface#773

astachowiczhabana added a commit that referenced this pull request Nov 28, 2024
…39)

Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants