Skip to content

Add the safetenors support#770

Closed
yuanwu2017 wants to merge 1 commit into
huggingface:mainfrom
yuanwu2017:tgi
Closed

Add the safetenors support#770
yuanwu2017 wants to merge 1 commit into
huggingface:mainfrom
yuanwu2017:tgi

Conversation

@yuanwu2017
Copy link
Copy Markdown
Contributor

What does this PR do?

Some models only have safetensors file. Add the safetensors support.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Signed-off-by: yuanwu <yuan.wu@intel.com>
@yuanwu2017 yuanwu2017 requested a review from regisss as a code owner March 6, 2024 13:37
Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuanwu2017 can you provide more information about this change? like what's the failure info and which model you are running?

@yao-matrix
Copy link
Copy Markdown
Contributor

@libinta , this is a customer reported issue, when they AutoTP to run models like upstage/SOLAR-10.7B-v1.0 which only has safetensors weights, tgi-gaudi will crash, log as below.

To fix this issue, we need this PR and another PR in Habana's DeepSpeed. Thx.

docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 2

2024-03-06T03:07:40.038707Z INFO text_generation_launcher: Args { model_id: "upstage/SOLAR-10.7B-v1.0", revision: None, validation_workers: 2, sharded: Some(true), num_shard: Some(2), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "8da2a7e801cd", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-03-06T03:07:40.038729Z INFO text_generation_launcher: Sharding model on 2 processes
2024-03-06T03:07:40.038815Z INFO download: text_generation_launcher: Starting download process.
2024-03-06T03:07:41.988320Z INFO text_generation_launcher: Files are already present on the host. Skipping download.

2024-03-06T03:07:42.241217Z INFO download: text_generation_launcher: Successfully downloaded weights.
2024-03-06T03:07:42.241631Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-03-06T03:07:45.846512Z INFO text_generation_launcher: CLI SHARDED = True DTYPE = bfloat16

2024-03-06T03:07:45.846673Z INFO text_generation_launcher: CLI SHARDED = 2

2024-03-06T03:07:45.846750Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 2 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id upstage/SOLAR-10.7B-v1.0 --revision None --sharded True --dtype bfloat16 --uds_path /tmp/text-generation-server

2024-03-06T03:07:52.249558Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:02.257380Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:12.264563Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:22.271436Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-03-06T03:08:31.413123Z ERROR text_generation_launcher: deepspeed --num_nodes 1 --num_gpus 2 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id upstage/SOLAR-10.7B-v1.0 --revision None --sharded True --dtype bfloat16 --uds_path /tmp/text-generation-server exited with status = 1

2024-03-06T03:08:31.878474Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
2024-03-06 03:07:55.916 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-03-06 03:07:55.916 | INFO | main:main:11 - TGIService: --model_id upstage/SOLAR-10.7B-v1.0, --revision None, --sharded True, --dtype bfloat16, --uds_path /tmp/text-generation-server
2024-03-06 03:07:55.938 | INFO | main:main:10 - TGIService: starting tgi service ....
2024-03-06 03:07:55.938 | INFO | main:main:11 - TGIService: --model_id upstage/SOLAR-10.7B-v1.0, --revision None, --sharded True, --dtype bfloat16, --uds_path /tmp/text-generation-server
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
Loading 0 checkpoint shards: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 610, in init
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 346, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 168, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 340, in replace_transformer_layer
replaced_module = set_lm_head(replaced_module)
UnboundLocalError: local variable 'replaced_module' referenced before assignment
Loading 0 checkpoint shards: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 191, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 152, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 610, in init
model = deepspeed.init_inference(model, **ds_inference_kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 346, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 168, in init
self._apply_injection_policy(config, client_module)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 417, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 340, in replace_transformer_layer
replaced_module = set_lm_head(replaced_module)
UnboundLocalError: local variable 'replaced_module' referenced before assignment
Loading 0 checkpoint shards: 0it [00:00, ?it/s]
Loading 0 checkpoint shards: 0it [00:00, ?it/s]
terminate called after throwing an instance of 'std::system_error'
what(): Connection reset by peer
Internal Error: Received signal - Aborted rank=0
2024-03-06T03:08:31.973607Z ERROR text_generation_launcher: Shard 0 failed to start
2024-03-06T03:08:31.973620Z INFO text_generation_launcher: Shutting down shards

@yuanwu2017
Copy link
Copy Markdown
Contributor Author

file_list = [str(entry) for entry in Path(cached_repo_dir).rglob("*.[bp][it][n]") if entry.is_file()]
return file_list
The first line also has a bug. Because it only can get the files with bin and ptn suffix. And it takes two files with different suffixes at the same time. The correct result should be files that has only one type of suffix at the same time.

(base) sdp@a4bf01943df8:~/yuanwu/test$ cat files.py
from pathlib import Path
file_list = [str(entry) for entry in Path("./").rglob("*.[bp][it][n]") if entry.is_file()]
print(file_list)
(base) sdp@a4bf01943df8:~/yuanwu/test$ ls
1.bin  3.pt  3.ptn  3.safetensors  files.py
(base) sdp@a4bf01943df8:~/yuanwu/test$ python files.py
['3.ptn', '1.bin']
(base) sdp@a4bf01943df8:~/yuanwu/test$ touch 2.bin
(base) sdp@a4bf01943df8:~/yuanwu/test$ python files.py
['2.bin', '3.ptn', '1.bin']

@emascarenhas
Copy link
Copy Markdown
Contributor

Can you provide a link for the associated DeepSpeed PR? Thanks.

@yao-matrix
Copy link
Copy Markdown
Contributor

Can you provide a link for the associated DeepSpeed PR? Thanks.

HabanaAI/DeepSpeed#2, Habana will put in next rls.

@yuanwu2017
Copy link
Copy Markdown
Contributor Author

@libinta @regisss Please help to review the patch.

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Mar 19, 2024

@libinta @regisss Please help to review the patch.

@yuanwu2017 does #773 resolve your issue?

@yuanwu2017
Copy link
Copy Markdown
Contributor Author

It is ok. I will close the PR.

@yuanwu2017 yuanwu2017 closed this Mar 19, 2024
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 6, 2025
…arquet version) (huggingface#2305) (huggingface#770)

Co-authored-by: Grzegorz Pluto-Prondzinski <gplutopx@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants