'aten::repeat_interleave.Tensor'... will fall back to run on the CPU Warning Causes unbox expects Dml at::Tensor as inputs #449

NateAGeek · 2023-05-03T01:20:04Z

Hello,

I am currently trying to run a slightly modified StableLM example.

import torch
import torch_directml
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

# Get Direct ML device
dml = torch_directml.device()

# Load in the model
model_name = "stabilityai/stablelm-tuned-alpha-3b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=getattr(torch, "float16"),
    offload_folder="./offload",
).to(dml)

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [50278, 50279, 50277, 1, 0]
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to(dml)
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

I am getting this warning first:

transformers\generation\utils.py:690: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)

Then I get this error:

RuntimeError: tensor.device().type() == at::DeviceType::PrivateUse1 INTERNAL ASSERT FAILED at "D:\\a\\_work\\1\\s\\pytorch-directml-plugin\\torch_directml\\csrc\\dml\\DMLTensor.cpp":31, please report a bug to PyTorch. unbox expects Dml at::Tensor as inputs

I assume this error is due to the tokenizer is suppose to be pushed onto the GPU, however, since DirectML backend does not support aten::repeat_interleave.Tensor it gets pushed onto the CPU. While my model is on the GPU, hence why I get the unbox expects Dml at::Tensor as inputs issue. Is there an update on when aten::repeat_interleave.Tensor will be supported or a possible work around for the time being?

Thank you :D

The text was updated successfully, but these errors were encountered:

AgentSmithers · 2023-05-13T22:35:50Z

I'm receiving the same issue, Running an amd 7900XT
`import torch
import torch_directml
dml = torch_directml.device()
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "mosaicml/mpt-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True,trust_remote_code=True)

import time
timea = time.time()
prompt = "A lion is"
inputs = tokenizer(prompt, return_tensors='pt').to(dml)
outputs = model.generate(
**inputs, max_new_tokens=20, do_sample=True, temperature=0.75 , return_dict_in_generate=True
)
token = outputs.sequences[0]
output_str = tokenizer.decode(token)
print(output_str)
print("timea = time.time()",-timea + time.time())`

Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.89s/it]
C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py:1405: UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on privateuseone, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cpu') before running .generate(). warnings.warn(
C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py:690: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
Traceback (most recent call last):
File "C:\Users\2080\Desktop\LLama\PyTouchPackages\test.py", line 14, in
outputs = model.generate(
File "C:\Users\2080\miniconda3\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\2080/.cache\huggingface\modules\transformers_modules\mosaicml\mpt-7b\d8304854d4877849c3c0a78f3469512a84419e84\modeling_mpt.py", line 237, in forward
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\2080/.cache\huggingface\modules\transformers_modules\mosaicml\mpt-7b\d8304854d4877849c3c0a78f3469512a84419e84\modeling_mpt.py", line 152, in forward
tok_emb = self.wte(input_ids)
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\sparse.py", line 162, in forward
return F.embedding(
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'devices' argument must be DML

albcunha · 2023-06-16T04:44:11Z

+1

Adele101 · 2023-10-04T04:18:34Z

Hi all, thank you for submitting this issue. While I can't provide a timeline for resolution as the moment, please know that your feedback is valuable to us. We will follow up once we can review this issue.

poo0054 · 2024-04-13T14:35:05Z

Is this my question? -> #578

fdwr added the pytorch-directml Issues in PyTorch when using its DirectML backend label May 10, 2023

This was referenced Feb 18, 2024

Failed to run with DirectML kazssym/stablelm-study-1#2

Open

Experimental DirectML support via torch-directml and onnxruntime-directml huggingface/optimum#1702

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'aten::repeat_interleave.Tensor'... will fall back to run on the CPU Warning Causes unbox expects Dml at::Tensor as inputs #449

'aten::repeat_interleave.Tensor'... will fall back to run on the CPU Warning Causes unbox expects Dml at::Tensor as inputs #449

NateAGeek commented May 3, 2023 •

edited

Loading

AgentSmithers commented May 13, 2023

albcunha commented Jun 16, 2023

Adele101 commented Oct 4, 2023

poo0054 commented Apr 13, 2024

'aten::repeat_interleave.Tensor'... will fall back to run on the CPU Warning Causes unbox expects Dml at::Tensor as inputs #449

'aten::repeat_interleave.Tensor'... will fall back to run on the CPU Warning Causes unbox expects Dml at::Tensor as inputs #449

Comments

NateAGeek commented May 3, 2023 • edited Loading

AgentSmithers commented May 13, 2023

albcunha commented Jun 16, 2023

Adele101 commented Oct 4, 2023

poo0054 commented Apr 13, 2024

NateAGeek commented May 3, 2023 •

edited

Loading