Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

No effect while running models with `trust_remote_code=True #1352

Closed
mudler opened this issue Mar 6, 2024 · 3 comments · Fixed by #1354
Closed

No effect while running models with `trust_remote_code=True #1352

mudler opened this issue Mar 6, 2024 · 3 comments · Fixed by #1354
Assignees

Comments

@mudler
Copy link

mudler commented Mar 6, 2024

Hi 👋

I'm the LocalAI author here, and I'm trying to implement transformers support for Intel GPUs in mudler/LocalAI#1746.

I'm struggling to make the example here to work, following the quick start in this repository on top of the oneapi container image (and installing with pip intel-extension-for-transformers) seems to completely ignore the trust_remote_code option:

/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvisi
on` from source?
  warn(
2024-03-06 19:29:10,005 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
qwen.tiktoken: 100%|██████████████████████████████████████████████████████████████| 2.56M/2.56M [00:00<00:00, 4.06MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████| 911/911 [00:00<00:00, 2.99MB/s]
The repository for Qwen/Qwen-7B contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/Qwen/Qwen-7B.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

To note, I have latest transformers (4.38.2) and I just followed the documentation. things seems to work, but trust_remote_code seems to be completely ignored.

@mudler
Copy link
Author

mudler commented Mar 6, 2024

For context, this is the code from the quickstart that I'm running:

import intel_extension_for_pytorch as ipex
from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
from transformers import AutoTokenizer

device_map = "xpu"
model_name ="Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = "Once upon a time, there existed a little girl,"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device_map)

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
                                              device_map=device_map, load_in_4bit=True)

model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq=True, device=device_map)

output = model.generate(inputs)

@zhenwei-intel
Copy link
Contributor

zhenwei-intel commented Mar 7, 2024

Hi @mudler ,

Thank you for your feedback, I fixed this issue in pr #1354 and updated demo

import intel_extension_for_pytorch as ipex
from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

device = "xpu"
model_name = "Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = "Once upon a time, there existed a little girl,"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

qmodel = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map="xpu", trust_remote_code=True)

# optimize the model with ipex, it will improve performance.
qmodel = ipex.optimize_transformers(qmodel, inplace=True, dtype=torch.float16, quantization_config={}, device="xpu")

output = user_model.generate(inputs)

@mudler
Copy link
Author

mudler commented Mar 7, 2024

Hi @mudler ,

Thank you for your feedback, I fixed this issue in pr #1354 and updated demo

import intel_extension_for_pytorch as ipex
from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

device = "xpu"
model_name = "Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = "Once upon a time, there existed a little girl,"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

qmodel = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map="xpu", trust_remote_code=True)

# optimize the model with ipex, it will improve performance.
qmodel = ipex.optimize_transformers(qmodel, inplace=True, dtype=torch.float16, quantization_config={}, device="xpu")

output = user_model.generate(inputs)

that was quick, thanks!

@DDEle DDEle linked a pull request Mar 7, 2024 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants