You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I have a fine-tuned LLM model from Hugging Face saved in PEFT format, and it’s about 2.1 GB. When we convert it to ONNX, its size nearly doubles to about 4.1 GB. What causes this significant increase in model size after converting from PEFT to ONNX? Is there any bug under this conversion? ( Here is the code do this conversion. Need to mention: loading it in any commented formats will kill the accuracy). Thanks
model = ORTModelForCausalLM.from_pretrained(
peft_path,
provider='OpenVINOExecutionProvider',
provider_options={'device_type': 'GPU_FP16'},
# use_cache=False,#use_io_binding=False
export=True,
#load_in_4bit=True,#load_in_8bit=True#torch_dtype=torch.bfloat16,#device_map=device,#from_transformers=True
)
tokenizer = AutoTokenizer.from_pretrained(peft_path)
model.save_pretrained(onnex_path)
tokenizer.save_pretrained(onnex_path)
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
model = ORTModelForCausalLM.from_pretrained(
peft_path,
provider='OpenVINOExecutionProvider',
provider_options={'device_type': 'GPU_FP16'},
# use_cache=False,
#use_io_binding=False
export=True,
#load_in_4bit=True,
#load_in_8bit=True
#torch_dtype=torch.bfloat16,
#device_map=device,
#from_transformers=True
)
tokenizer = AutoTokenizer.from_pretrained(peft_path)
model.save_pretrained(onnex_path)
tokenizer.save_pretrained(onnex_path)
Expected behavior
I need to have the OONX model with at least the same size while not loosing accuracy performance.
The text was updated successfully, but these errors were encountered: