Update ckpt loading by schoi-habana · Pull Request #38 · HabanaAI/optimum-habana-fork

schoi-habana · 2024-02-14T01:17:38Z

PR#15 reads a set of ckpt file names from the index json file.
When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names.
This PR scans the path and returns a list of names that matches the pattern

Tested the models in cache dir:
Falcon-7b, Falcon-40b, Falcon-180b, Llama-70b

Tested the models downloaded from HF hub:
Falcon-7b, Falcon-40b, TinyLlama

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

mandy-li · 2024-02-14T01:41:47Z

@schoi-habana , please also test downloaded ckpt for falcon-180b

puneeshkhanna · 2024-02-14T04:49:01Z

@schoi-habana - WIll this solve llama2 ; right now without your change we see below error:

pickle.UnpicklingError: Memo value not found at index 107
Loading 2 checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading 2 checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 626, in
main()
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 277, in main
model, tokenizer, generation_config = initialize_model(args, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 374, in initialize_model
else setup_distributed_model(args, model_dtype, model_kwargs, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 235, in setup_distributed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 154, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 333, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 571, in replace_module
sd = torch.load(checkpoint, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1026, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1244, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: Memo value not found at index 107
[2024-02-14 06:40:09,891] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.4+f4fa754c, git-hash=f4fa754c, git-branch=HEAD
[2024-02-14 06:40:09,892] [INFO] [logging.py:96:log_dist] [Rank 0] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Loading 2 checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 626, in
main()
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/run_generation.py", line 277, in main
model, tokenizer, generation_config = initialize_model(args, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 374, in initialize_model
else setup_distributed_model(args, model_dtype, model_kwargs, logger)
File "/software/users/pkhanna/optimum-habana-fork/examples/text-generation/utils.py", line 235, in setup_distributed_model
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 154, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 333, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 571, in replace_module
sd = torch.load(checkpoint, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1026, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1244, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'x'.

puneeshkhanna · 2024-02-14T04:52:46Z

Please tell some 70b command if it works:
python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py --model_name_or_path /software/
data/llama_inference/Llama-2-70b-hf/ --use_hpu_graphs --use_kv_cache --max_input_tokens 128 --max_new_tokens 1024 --batch_size 16 --attn_softmax_bf16 --trim_logits --bf16 --reuse_cache --warmup 2 --n_iterations 2 --limit_hpu_graphs

PR#15 reads a set of ckpt file names from the index json file. When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names. This PR scans the path and returns a list of names that matches the pattern

schoi-habana · 2024-02-14T07:28:37Z

@puneeshkhanna it may be related to DeepSpeed version, too. I'm aware of that error but couldn't repro in my test env. Which DS version is used in QA?

puneeshkhanna · 2024-02-14T09:14:00Z

@schoi-habana - I m using deepspeed-fork latest master.
DeepSpeed info: version=0.12.4+f4fa754c, git-hash=f4fa754c

Below are the contents of the checkpoints:
ls -al /mnt/weka/data/pytorch/llama2/Llama-2-70b-hf/
total 269444208
drwxrwxr-x 1 2002 2002 0 Oct 18 12:39 .
drwxrwxr-x 1 2002 2002 0 Oct 18 12:39 ..
-rw-rw-r-- 1 2002 2002 609 Oct 18 13:05 config.json
-rw-rw-r-- 1 2002 2002 167 Oct 18 13:24 generation_config.json
-rw-rw-r-- 1 2002 2002 7020 Oct 18 13:09 LICENSE.txt
-rw-rw-r-- 1 2002 2002 9852591960 Oct 18 13:14 model-00001-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099016 Oct 18 13:11 model-00002-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9965870512 Oct 18 13:24 model-00003-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798066064 Oct 18 13:09 model-00004-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:28 model-00005-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:17 model-00006-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9965870512 Oct 18 13:41 model-00007-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798066064 Oct 18 13:27 model-00008-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:30 model-00009-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:22 model-00010-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9965870512 Oct 18 13:20 model-00011-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798066064 Oct 18 13:43 model-00012-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9798099056 Oct 18 13:37 model-00013-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 9496124816 Oct 18 13:50 model-00014-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 524288128 Oct 18 13:25 model-00015-of-00015.safetensors
-rw-rw-r-- 1 2002 2002 66725 Oct 18 13:24 model.safetensors.index.json
-rw-rw-r-- 1 2002 2002 9852605685 Oct 18 13:34 pytorch_model-00001-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113337 Oct 18 13:13 pytorch_model-00002-of-00015.bin
-rw-rw-r-- 1 2002 2002 9965883861 Oct 18 13:19 pytorch_model-00003-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798079785 Oct 18 13:10 pytorch_model-00004-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113313 Oct 18 13:18 pytorch_model-00005-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113337 Oct 18 13:08 pytorch_model-00006-of-00015.bin
-rw-rw-r-- 1 2002 2002 9965883861 Oct 18 13:13 pytorch_model-00007-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798079785 Oct 18 13:37 pytorch_model-00008-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113313 Oct 18 13:10 pytorch_model-00009-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113337 Oct 18 13:21 pytorch_model-00010-of-00015.bin
-rw-rw-r-- 1 2002 2002 9965883861 Oct 18 13:23 pytorch_model-00011-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798079785 Oct 18 13:26 pytorch_model-00012-of-00015.bin
-rw-rw-r-- 1 2002 2002 9798113313 Oct 18 13:19 pytorch_model-00013-of-00015.bin
-rw-rw-r-- 1 2002 2002 9496138113 Oct 18 13:18 pytorch_model-00014-of-00015.bin
-rw-rw-r-- 1 2002 2002 524288938 Oct 18 13:21 pytorch_model-00015-of-00015.bin
-rw-rw-r-- 1 2002 2002 66725 Oct 18 13:24 pytorch_model.bin.index.json
-rw-rw-r-- 1 2002 2002 10372 Oct 18 13:23 README.md
-rw-rw-r-- 1 2002 2002 1253223 Oct 18 13:09 Responsible-Use-Guide.pdf
-rw-rw-r-- 1 2002 2002 414 Oct 18 13:41 special_tokens_map.json
-rw-rw-r-- 1 2002 2002 776 Oct 18 13:27 tokenizer_config.json
-rw-rw-r-- 1 2002 2002 1842767 Oct 18 13:34 tokenizer.json
-rw-rw-r-- 1 2002 2002 499723 Oct 18 13:18 tokenizer.model
-rw-rw-r-- 1 2002 2002 4766 Oct 18 13:05 USE_POLICY.md

schoi-habana · 2024-02-14T18:05:14Z

@puneeshkhanna DeepSpeed-fork git-hash=f4fa754c is not latest, there is a change you need to load safetensors.
Anyway this PR will load bins so no need for you to checkout later changes in DS-fork. Can you test this PR in your env?

mandy-li

LGTM

* enable loading falcon-180b ckpt in .safetensors format * Address comments borrowing transformer's way of reading ckpt file * address comments * Update ckpt loading PR#15 reads a set of ckpt file names from the index json file. When OH downloads files from the hub instead of loading from a cache dir, get_repo_root() skips downloading the index json file. Thus the PR#15 fails to load file names. This PR scans the path and returns a list of names that matches the pattern * import modeling_utils from transformers

astachowiczhabana · 2024-06-07T14:20:03Z

huggingface#773

…39) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

schoi-habana added 4 commits February 1, 2024 23:10

enable loading falcon-180b ckpt in .safetensors format

70c9540

Address comments borrowing transformer's way of reading ckpt file

21f3869

address comments

6b68d4e

Merge branch 'habana-main' into schoi/falcon_180b

6171905

schoi-habana requested review from mandy-li and vivekgoe February 14, 2024 01:17

schoi-habana marked this pull request as ready for review February 14, 2024 01:18

schoi-habana force-pushed the schoi/falcon_180b branch from 4f4b772 to f32ae33 Compare February 14, 2024 07:07

mandy-li reviewed Feb 14, 2024

View reviewed changes

Comment thread optimum/habana/checkpoint_utils.py Outdated

import modeling_utils from transformers

b74ccce

mandy-li self-requested a review February 14, 2024 19:58

mandy-li approved these changes Feb 14, 2024

View reviewed changes

mandy-li merged commit 453b14a into habana-main Feb 14, 2024

astachowiczhabana added a commit that referenced this pull request Nov 28, 2024

[SW-195484] Fix dtype issue in valid sequence length with bs=1 (#38) (#…

b263f00

…39) Co-authored-by: Adam Stachowicz <105052242+astachowiczhabana@users.noreply.github.com>

xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025

[SW-195484] Fix dtype issue in valid sequence length with bs=1 (#38)

fe513b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ckpt loading#38

Update ckpt loading#38
mandy-li merged 6 commits into
habana-mainfrom
schoi/falcon_180b

schoi-habana commented Feb 14, 2024

Uh oh!

mandy-li commented Feb 14, 2024

Uh oh!

puneeshkhanna commented Feb 14, 2024

Uh oh!

puneeshkhanna commented Feb 14, 2024

Uh oh!

schoi-habana commented Feb 14, 2024

Uh oh!

puneeshkhanna commented Feb 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

schoi-habana commented Feb 14, 2024

Uh oh!

mandy-li left a comment

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

schoi-habana commented Feb 14, 2024

Before submitting

Uh oh!

mandy-li commented Feb 14, 2024

Uh oh!

puneeshkhanna commented Feb 14, 2024

Uh oh!

puneeshkhanna commented Feb 14, 2024

Uh oh!

schoi-habana commented Feb 14, 2024

Uh oh!

puneeshkhanna commented Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

schoi-habana commented Feb 14, 2024

Uh oh!

mandy-li left a comment

Choose a reason for hiding this comment

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

puneeshkhanna commented Feb 14, 2024 •

edited

Loading