Add the safetenors support#770
Conversation
Signed-off-by: yuanwu <yuan.wu@intel.com>
libinta
left a comment
There was a problem hiding this comment.
@yuanwu2017 can you provide more information about this change? like what's the failure info and which model you are running?
|
@libinta , this is a customer reported issue, when they AutoTP to run models like upstage/SOLAR-10.7B-v1.0 which only has safetensors weights, tgi-gaudi will crash, log as below. To fix this issue, we need this PR and another PR in Habana's DeepSpeed. Thx. docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id $model --sharded true --num-shard 2 2024-03-06T03:07:40.038707Z INFO text_generation_launcher: Args { model_id: "upstage/SOLAR-10.7B-v1.0", revision: None, validation_workers: 2, sharded: Some(true), num_shard: Some(2), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "8da2a7e801cd", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false } 2024-03-06T03:07:42.241217Z INFO download: text_generation_launcher: Successfully downloaded weights. 2024-03-06T03:07:45.846673Z INFO text_generation_launcher: CLI SHARDED = 2 2024-03-06T03:07:45.846750Z INFO text_generation_launcher: CLI server start deepspeed =deepspeed --num_nodes 1 --num_gpus 2 --no_local_rank /usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py --model_id upstage/SOLAR-10.7B-v1.0 --revision None --sharded True --dtype bfloat16 --uds_path /tmp/text-generation-server 2024-03-06T03:07:52.249558Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2024-03-06T03:08:31.878474Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output: /usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/init.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead. |
|
file_list = [str(entry) for entry in Path(cached_repo_dir).rglob("*.[bp][it][n]") if entry.is_file()] |
|
Can you provide a link for the associated DeepSpeed PR? Thanks. |
HabanaAI/DeepSpeed#2, Habana will put in next rls. |
|
@yuanwu2017 does #773 resolve your issue? |
|
It is ok. I will close the PR. |
…arquet version) (huggingface#2305) (huggingface#770) Co-authored-by: Grzegorz Pluto-Prondzinski <gplutopx@habana.ai>
What does this PR do?
Some models only have safetensors file. Add the safetensors support.
Fixes # (issue)
Before submitting