You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[W AddKernelNpu.cpp:82] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
The text was updated successfully, but these errors were encountered:
torch-npu 2.2.0
`import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "npu"
THUDM/glm49bchat
tokenizer = AutoTokenizer.from_pretrained("/home/ma-user/THUDM/glm49bchat",trust_remote_code=True)
query = "你好"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": query}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True
)
inputs = inputs.npu()
for i,j in inputs.items():
inputs[i] = j.npu()
model = AutoModelForCausalLM.from_pretrained(
gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
"/home/ma-user/THUDM/glm49bchat",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).npu().eval()
input_ids = input_ids.to('npu')
inputs['input_ids'] = inputs['input_ids'].npu()
inputs['attention_mask'] = inputs['attention_mask'].npu()
inputs['position_ids'] = inputs['position_ids'].npu()
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0], skip_special_tokens=True))`
[W AddKernelNpu.cpp:82] Warning: The oprator of add is executed, Currently High Accuracy but Low Performance OP with 64-bit has been used, Please Do Some Cast at Python Functions with 32-bit for Better Performance! (function operator())
[W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
The text was updated successfully, but these errors were encountered: