No effect while running models with `trust_remote_code=True #1352

mudler · 2024-03-06T19:32:25Z

Hi 👋

I'm the LocalAI author here, and I'm trying to implement transformers support for Intel GPUs in mudler/LocalAI#1746.

I'm struggling to make the example here to work, following the quick start in this repository on top of the oneapi container image (and installing with pip intel-extension-for-transformers) seems to completely ignore the trust_remote_code option:

/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvisi
on` from source?
  warn(
2024-03-06 19:29:10,005 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
qwen.tiktoken: 100%|██████████████████████████████████████████████████████████████| 2.56M/2.56M [00:00<00:00, 4.06MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████| 911/911 [00:00<00:00, 2.99MB/s]
The repository for Qwen/Qwen-7B contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/Qwen/Qwen-7B.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

To note, I have latest transformers (4.38.2) and I just followed the documentation. things seems to work, but trust_remote_code seems to be completely ignored.

The text was updated successfully, but these errors were encountered:

mudler · 2024-03-06T19:35:46Z

For context, this is the code from the quickstart that I'm running:

import intel_extension_for_pytorch as ipex
from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
from transformers import AutoTokenizer

device_map = "xpu"
model_name ="Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = "Once upon a time, there existed a little girl,"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device_map)

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
                                              device_map=device_map, load_in_4bit=True)

model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq=True, device=device_map)

output = model.generate(inputs)

zhenwei-intel · 2024-03-07T07:14:14Z

Hi @mudler ,

Thank you for your feedback, I fixed this issue in pr #1354 and updated demo

import intel_extension_for_pytorch as ipex
from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

device = "xpu"
model_name = "Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = "Once upon a time, there existed a little girl,"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

qmodel = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map="xpu", trust_remote_code=True)

# optimize the model with ipex, it will improve performance.
qmodel = ipex.optimize_transformers(qmodel, inplace=True, dtype=torch.float16, quantization_config={}, device="xpu")

output = user_model.generate(inputs)

mudler · 2024-03-07T08:22:19Z

Hi @mudler ,

Thank you for your feedback, I fixed this issue in pr #1354 and updated demo

import intel_extension_for_pytorch as ipex
from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

device = "xpu"
model_name = "Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = "Once upon a time, there existed a little girl,"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

qmodel = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, device_map="xpu", trust_remote_code=True)

# optimize the model with ipex, it will improve performance.
qmodel = ipex.optimize_transformers(qmodel, inplace=True, dtype=torch.float16, quantization_config={}, device="xpu")

output = user_model.generate(inputs)

that was quick, thanks!

kevinintel assigned zhenwei-intel Mar 7, 2024

zhenwei-intel mentioned this issue Mar 7, 2024

update demo for ipex gpu #1354

Merged

DDEle linked a pull request Mar 7, 2024 that will close this issue

update demo for ipex gpu #1354

Merged

hshen14 closed this as completed in #1354 Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No effect while running models with `trust_remote_code=True #1352

No effect while running models with `trust_remote_code=True #1352

mudler commented Mar 6, 2024

mudler commented Mar 6, 2024

zhenwei-intel commented Mar 7, 2024 •

edited

Loading

mudler commented Mar 7, 2024

No effect while running models with `trust_remote_code=True #1352

No effect while running models with `trust_remote_code=True #1352

Comments

mudler commented Mar 6, 2024

mudler commented Mar 6, 2024

zhenwei-intel commented Mar 7, 2024 • edited Loading

mudler commented Mar 7, 2024

zhenwei-intel commented Mar 7, 2024 •

edited

Loading