Skip to content

[Bug Report] Qwen gives nonsensical output when used on 'mps' #1008

@mistovek016

Description

@mistovek016

Describe the bug
When using Qwen to generate some text, it gives me the output in other languages (mainly Chinese gibberish, but occasionally some other languages as well.

This happens only when I try to use 'mps' on my MacBook, but I have no such problem when I ssh into an GPU on the cloud and use 'cuda'.

Code example
I'm using a method to generate content by taking the logit with the maximum probability (greedy sampling), decoding it to a string and appending it to the output...

from tqdm import tqdm
from transformer_lens.hook_points import HookPoint
from transformer_lens import HookedTransformer
import torch

DEVICE = torch.device('mps')

def get_model(model_name):
    model = HookedTransformer.from_pretrained_no_processing(model_name, device = DEVICE, dtype=torch.float16, default_padding_side='left', output_hidden_states=True)
    model.eval()
    model.to(DEVICE)
    return model

model = get_model("Qwen/Qwen1.5-4B-Chat")
# model = get_model("Qwen/Qwen1.5-1.8B-Chat")

def tokenize_prompt(model: HookedTransformer, prompt_str: str) -> str:
    
    prompt_message = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt_str}
    ]

    prompt_chat_tokenized = model.tokenizer.apply_chat_template(
        prompt_message, tokenize=True, add_generation_prompt=True)
    prompt_chat_str = model.tokenizer.apply_chat_template(
        prompt_message, tokenize=False, add_generation_prompt=True)
        
    return prompt_chat_tokenized, prompt_chat_str

def generate_output(model: HookedTransformer, prompt_chat_str: str, max_new_tokens: int) -> tuple[str, dict, int]:
    output_str = prompt_chat_str
    for i in tqdm(range(max_new_tokens)):
        # Get the logits and cache for the current prompt
        logits, cache = model.run_with_cache(output_str)

        # Get the predicted next token (using argmax for temperature 0)
        next_token = logits[0, -1].argmax()

        # Convert the next token to a string
        next_token_str = model.to_string(next_token)

        # Append the new token to the prompt for the next iteration
        output_str += next_token_str
        
        if next_token.item() == model.tokenizer.eos_token_id:
            break
    
    return output_str

def normal_generation(model, prompt, max_tokens):
    _, pt = tokenize_prompt(model, prompt)
    base_gen = generate_output(model, pt, max_tokens)
    return base_gen

print(normal_generation(model, "Hello!", 32))

Output
Qwen/Qwen1.5-4B-Chat

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
将是有关有关有关有关有关有关有关有关入入入入入有关入入入入入入入入入入入入入入入入入

Qwen/Qwen1.5-1.8B-Chat

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之书面书面当之当之当之书面

System Info
Describe the characteristic of your environment:

  • How was TransformerLens installed: pip
  • OS: MacOS 26 (Developer Beta)
  • Python version: 3.10.18 (venv)

Additional context
I've tried this with both the "Qwen/Qwen1.5-4B-Chat" and "Qwen/Qwen1.5-1.8B-Chat". The 1.8B model generally repeats 2-3 characters, and the 4B model gives a more diverse range of characters (still gibberish).

Any help is much appreciated!

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions