Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'aten::repeat_interleave.Tensor'... will fall back to run on the CPU Warning Causes unbox expects Dml at::Tensor as inputs #449

Open
NateAGeek opened this issue May 3, 2023 · 4 comments
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend

Comments

@NateAGeek
Copy link

NateAGeek commented May 3, 2023

Hello,

I am currently trying to run a slightly modified StableLM example.

import torch
import torch_directml
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

# Get Direct ML device
dml = torch_directml.device()

# Load in the model
model_name = "stabilityai/stablelm-tuned-alpha-3b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=getattr(torch, "float16"),
    offload_folder="./offload",
).to(dml)

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [50278, 50279, 50277, 1, 0]
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to(dml)
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

I am getting this warning first:

transformers\generation\utils.py:690: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)

Then I get this error:

RuntimeError: tensor.device().type() == at::DeviceType::PrivateUse1 INTERNAL ASSERT FAILED at "D:\\a\\_work\\1\\s\\pytorch-directml-plugin\\torch_directml\\csrc\\dml\\DMLTensor.cpp":31, please report a bug to PyTorch. unbox expects Dml at::Tensor as inputs

I assume this error is due to the tokenizer is suppose to be pushed onto the GPU, however, since DirectML backend does not support aten::repeat_interleave.Tensor it gets pushed onto the CPU. While my model is on the GPU, hence why I get the unbox expects Dml at::Tensor as inputs issue. Is there an update on when aten::repeat_interleave.Tensor will be supported or a possible work around for the time being?

Thank you :D

@fdwr fdwr added the pytorch-directml Issues in PyTorch when using its DirectML backend label May 10, 2023
@AgentSmithers
Copy link

I'm receiving the same issue, Running an amd 7900XT
`import torch
import torch_directml
dml = torch_directml.device()
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "mosaicml/mpt-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True,trust_remote_code=True)

import time
timea = time.time()
prompt = "A lion is"
inputs = tokenizer(prompt, return_tensors='pt').to(dml)
outputs = model.generate(
**inputs, max_new_tokens=20, do_sample=True, temperature=0.75 , return_dict_in_generate=True
)
token = outputs.sequences[0]
output_str = tokenizer.decode(token)
print(output_str)
print("timea = time.time()",-timea + time.time())`

Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.89s/it]
C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py:1405: UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on privateuseone, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cpu') before running .generate(). warnings.warn(
C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py:690: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
Traceback (most recent call last):
File "C:\Users\2080\Desktop\LLama\PyTouchPackages\test.py", line 14, in
outputs = model.generate(
File "C:\Users\2080\miniconda3\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "C:\Users\2080\miniconda3\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\2080/.cache\huggingface\modules\transformers_modules\mosaicml\mpt-7b\d8304854d4877849c3c0a78f3469512a84419e84\modeling_mpt.py", line 237, in forward
outputs = self.transformer(input_ids=input_ids, past_key_values=past_key_values, attention_mask=attention_mask, prefix_mask=prefix_mask, sequence_id=sequence_id, return_dict=return_dict, output_attentions=output_attentions, output_hidden_states=output_hidden_states, use_cache=use_cache)
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\2080/.cache\huggingface\modules\transformers_modules\mosaicml\mpt-7b\d8304854d4877849c3c0a78f3469512a84419e84\modeling_mpt.py", line 152, in forward
tok_emb = self.wte(input_ids)
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\modules\sparse.py", line 162, in forward
return F.embedding(
File "C:\Users\2080\miniconda3\lib\site-packages\torch\nn\functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'devices' argument must be DML

@albcunha
Copy link

+1

@Adele101
Copy link

Adele101 commented Oct 4, 2023

Hi all, thank you for submitting this issue. While I can't provide a timeline for resolution as the moment, please know that your feedback is valuable to us. We will follow up once we can review this issue.

@poo0054
Copy link

poo0054 commented Apr 13, 2024

Is this my question? -> #578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend
Projects
None yet
Development

No branches or pull requests

6 participants