Add the safetenors support by yuanwu2017 · Pull Request #770 · huggingface/optimum-habana

yuanwu2017 · 2024-03-06T13:37:35Z

What does this PR do?

Some models only have safetensors file. Add the safetensors support.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Signed-off-by: yuanwu <yuan.wu@intel.com>

libinta

@yuanwu2017 can you provide more information about this change? like what's the failure info and which model you are running?

yao-matrix · 2024-03-07T00:24:04Z

@libinta , this is a customer reported issue, when they AutoTP to run models like upstage/SOLAR-10.7B-v1.0 which only has safetensors weights, tgi-gaudi will crash, log as below.

To fix this issue, we need this PR and another PR in Habana's DeepSpeed. Thx.

docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 2

2024-03-06T03:07:40.038707Z INFO text_generation_launcher: Args { model_id: "upstage/SOLAR-10.7B-v1.0", revision: None, validation_workers: 2, sharded: Some(true), num_shard: Some(2), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "8da2a7e801cd", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-06T03:07:40.038729Z INFO text_generation_launcher: Sharding model on 2 processes
2024-03-06T03:07:40.038815Z INFO download: text_generation_launcher: Starting download process.
2024-03-06T03:07:41.988320Z INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-03-06T03:07:42.241217Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-06T03:07:42.241631Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-06T03:07:45.846512Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16

2024-03-06T03:07:45.846673Z INFO text_generation_launcher: CLI SHARDED = 2

2024-03-06T03:07:45.846750Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 2 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id upstage/SOLAR-10.7B-v1.0 --revision None --sharded True --dtype bfloat16 --uds_path /tmp/text-generation-server

2024-03-06T03:07:52.249558Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:02.257380Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:12.264563Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:22.271436Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:31.413123Z ERROR text_generation_launcher: deepspeed --num_nodes 1 --num_gpus 2 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id upstage/SOLAR-10.7B-v1.0 --revision None --sharded True --dtype bfloat16 --uds_path /tmp/text-generation-server exited with status = 1

2024-03-06T03:08:31.878474Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
2024-03-06 03:07:55.916 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-03-06 03:07:55.916 | INFO | main:main:11 - TGIService: --model_id upstage/SOLAR-10.7B-v1.0, --revision None, --sharded True, --dtype bfloat16, --uds_path /tmp/text-generation-server
2024-03-06 03:07:55.938 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-03-06 03:07:55.938 | INFO | main:main:11 - TGIService: --model_id upstage/SOLAR-10.7B-v1.0, --revision None, --sharded True, --dtype bfloat16, --uds_path /tmp/text-generation-server
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
Loading 0 checkpoint shards: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 610, in init
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 346, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 168, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 340, in replace_transformer_layer
replaced_module = set_lm_head(replaced_module)
UnboundLocalError: local variable 'replaced_module' referenced before assignment
Loading 0 checkpoint shards: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 610, in init
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 346, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 168, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 340, in replace_transformer_layer
replaced_module = set_lm_head(replaced_module)
UnboundLocalError: local variable 'replaced_module' referenced before assignment
Loading 0 checkpoint shards: 0it [00:00, ?it/s]
Loading 0 checkpoint shards: 0it [00:00, ?it/s]
terminate called after throwing an instance of 'std::system_error'
what(): Connection reset by peer
Internal Error: Received signal - Aborted rank=0
2024-03-06T03:08:31.973607Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-06T03:08:31.973620Z INFO text_generation_launcher: Shutting down shards

yuanwu2017 · 2024-03-07T01:07:10Z

file_list = [str(entry) for entry in Path(cached_repo_dir).rglob("*.[bp][it][n]") if entry.is_file()]
return file_list
The first line also has a bug. Because it only can get the files with bin and ptn suffix. And it takes two files with different suffixes at the same time. The correct result should be files that has only one type of suffix at the same time.

(base) sdp@a4bf01943df8:~/yuanwu/test$ cat files.py
from pathlib import Path
file_list = [str(entry) for entry in Path("./").rglob("*.[bp][it][n]") if entry.is_file()]
print(file_list)
(base) sdp@a4bf01943df8:~/yuanwu/test$ ls
1.bin  3.pt  3.ptn  3.safetensors  files.py
(base) sdp@a4bf01943df8:~/yuanwu/test$ python files.py
['3.ptn', '1.bin']
(base) sdp@a4bf01943df8:~/yuanwu/test$ touch 2.bin
(base) sdp@a4bf01943df8:~/yuanwu/test$ python files.py
['2.bin', '3.ptn', '1.bin']

emascarenhas · 2024-03-08T15:45:38Z

Can you provide a link for the associated DeepSpeed PR? Thanks.

yao-matrix · 2024-03-11T05:06:02Z

Can you provide a link for the associated DeepSpeed PR? Thanks.

HabanaAI/DeepSpeed#2, Habana will put in next rls.

yuanwu2017 · 2024-03-19T00:29:45Z

@libinta @regisss Please help to review the patch.

libinta · 2024-03-19T00:39:44Z

@libinta @regisss Please help to review the patch.

@yuanwu2017 does #773 resolve your issue?

yuanwu2017 · 2024-03-19T00:56:38Z

It is ok. I will close the PR.

…arquet version) (huggingface#2305) (huggingface#770) Co-authored-by: Grzegorz Pluto-Prondzinski <gplutopx@habana.ai>

Add the safetenors support

2b20c15

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 requested a review from regisss as a code owner March 6, 2024 13:37

libinta reviewed Mar 6, 2024

View reviewed changes

yuanwu2017 closed this Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the safetenors support#770

Add the safetenors support#770
yuanwu2017 wants to merge 1 commit into
huggingface:mainfrom
yuanwu2017:tgi

yuanwu2017 commented Mar 6, 2024

Uh oh!

libinta left a comment

Uh oh!

yao-matrix commented Mar 7, 2024

Uh oh!

yuanwu2017 commented Mar 7, 2024

Uh oh!

emascarenhas commented Mar 8, 2024

Uh oh!

yao-matrix commented Mar 11, 2024

Uh oh!

yuanwu2017 commented Mar 19, 2024

Uh oh!

libinta commented Mar 19, 2024

Uh oh!

yuanwu2017 commented Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yuanwu2017 commented Mar 6, 2024

What does this PR do?

Before submitting

Uh oh!

libinta left a comment

Choose a reason for hiding this comment

Uh oh!

yao-matrix commented Mar 7, 2024

Uh oh!

yuanwu2017 commented Mar 7, 2024

Uh oh!

emascarenhas commented Mar 8, 2024

Uh oh!

yao-matrix commented Mar 11, 2024

Uh oh!

yuanwu2017 commented Mar 19, 2024

Uh oh!

libinta commented Mar 19, 2024

Uh oh!

yuanwu2017 commented Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants