-
Notifications
You must be signed in to change notification settings - Fork 458
Description
Describe the bug
When using Qwen to generate some text, it gives me the output in other languages (mainly Chinese gibberish, but occasionally some other languages as well.
This happens only when I try to use 'mps' on my MacBook, but I have no such problem when I ssh into an GPU on the cloud and use 'cuda'.
Code example
I'm using a method to generate content by taking the logit with the maximum probability (greedy sampling), decoding it to a string and appending it to the output...
from tqdm import tqdm
from transformer_lens.hook_points import HookPoint
from transformer_lens import HookedTransformer
import torch
DEVICE = torch.device('mps')
def get_model(model_name):
model = HookedTransformer.from_pretrained_no_processing(model_name, device = DEVICE, dtype=torch.float16, default_padding_side='left', output_hidden_states=True)
model.eval()
model.to(DEVICE)
return model
model = get_model("Qwen/Qwen1.5-4B-Chat")
# model = get_model("Qwen/Qwen1.5-1.8B-Chat")
def tokenize_prompt(model: HookedTransformer, prompt_str: str) -> str:
prompt_message = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt_str}
]
prompt_chat_tokenized = model.tokenizer.apply_chat_template(
prompt_message, tokenize=True, add_generation_prompt=True)
prompt_chat_str = model.tokenizer.apply_chat_template(
prompt_message, tokenize=False, add_generation_prompt=True)
return prompt_chat_tokenized, prompt_chat_str
def generate_output(model: HookedTransformer, prompt_chat_str: str, max_new_tokens: int) -> tuple[str, dict, int]:
output_str = prompt_chat_str
for i in tqdm(range(max_new_tokens)):
# Get the logits and cache for the current prompt
logits, cache = model.run_with_cache(output_str)
# Get the predicted next token (using argmax for temperature 0)
next_token = logits[0, -1].argmax()
# Convert the next token to a string
next_token_str = model.to_string(next_token)
# Append the new token to the prompt for the next iteration
output_str += next_token_str
if next_token.item() == model.tokenizer.eos_token_id:
break
return output_str
def normal_generation(model, prompt, max_tokens):
_, pt = tokenize_prompt(model, prompt)
base_gen = generate_output(model, pt, max_tokens)
return base_gen
print(normal_generation(model, "Hello!", 32))
Output
Qwen/Qwen1.5-4B-Chat
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
将是有关有关有关有关有关有关有关有关入入入入入有关入入入入入入入入入入入入入入入入入
Qwen/Qwen1.5-1.8B-Chat
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之书面书面当之当之当之书面
System Info
Describe the characteristic of your environment:
- How was TransformerLens installed: pip
- OS: MacOS 26 (Developer Beta)
- Python version: 3.10.18 (venv)
Additional context
I've tried this with both the "Qwen/Qwen1.5-4B-Chat" and "Qwen/Qwen1.5-1.8B-Chat". The 1.8B model generally repeats 2-3 characters, and the 4B model gives a more diverse range of characters (still gibberish).
Any help is much appreciated!
Checklist
- I have checked that there is no similar issue in the repo (required)