[Bug Report] Qwen gives nonsensical output when used on 'mps'

**Describe the bug**
When using Qwen to generate some text, it gives me the output in other languages (mainly Chinese gibberish, but occasionally some other languages as well.

**This happens only when I try to use 'mps' on my MacBook, but I have no such problem when I ssh into an GPU on the cloud and use 'cuda'.**

**Code example**
 I'm using a method to generate content by taking the logit with the maximum probability (greedy sampling), decoding it to a string and appending it to the output... 

```python
from tqdm import tqdm
from transformer_lens.hook_points import HookPoint
from transformer_lens import HookedTransformer
import torch

DEVICE = torch.device('mps')

def get_model(model_name):
    model = HookedTransformer.from_pretrained_no_processing(model_name, device = DEVICE, dtype=torch.float16, default_padding_side='left', output_hidden_states=True)
    model.eval()
    model.to(DEVICE)
    return model

model = get_model("Qwen/Qwen1.5-4B-Chat")
# model = get_model("Qwen/Qwen1.5-1.8B-Chat")

def tokenize_prompt(model: HookedTransformer, prompt_str: str) -> str:
    
    prompt_message = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt_str}
    ]

    prompt_chat_tokenized = model.tokenizer.apply_chat_template(
        prompt_message, tokenize=True, add_generation_prompt=True)
    prompt_chat_str = model.tokenizer.apply_chat_template(
        prompt_message, tokenize=False, add_generation_prompt=True)
        
    return prompt_chat_tokenized, prompt_chat_str

def generate_output(model: HookedTransformer, prompt_chat_str: str, max_new_tokens: int) -> tuple[str, dict, int]:
    output_str = prompt_chat_str
    for i in tqdm(range(max_new_tokens)):
        # Get the logits and cache for the current prompt
        logits, cache = model.run_with_cache(output_str)

        # Get the predicted next token (using argmax for temperature 0)
        next_token = logits[0, -1].argmax()

        # Convert the next token to a string
        next_token_str = model.to_string(next_token)

        # Append the new token to the prompt for the next iteration
        output_str += next_token_str
        
        if next_token.item() == model.tokenizer.eos_token_id:
            break
    
    return output_str

def normal_generation(model, prompt, max_tokens):
    _, pt = tokenize_prompt(model, prompt)
    base_gen = generate_output(model, pt, max_tokens)
    return base_gen

print(normal_generation(model, "Hello!", 32))
```
**Output**
_Qwen/Qwen1.5-4B-Chat_
```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
将是有关有关有关有关有关有关有关有关入入入入入有关入入入入入入入入入入入入入入入入入
```

_Qwen/Qwen1.5-1.8B-Chat_
```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之当之书面书面当之当之当之书面
```

**System Info**
Describe the characteristic of your environment:
 * How was TransformerLens installed: pip
 * OS: MacOS 26 (Developer Beta)
 * Python version: 3.10.18 (venv)

**Additional context**
I've tried this with both the "Qwen/Qwen1.5-4B-Chat" and "Qwen/Qwen1.5-1.8B-Chat". The 1.8B model generally repeats 2-3 characters, and the 4B model gives a more diverse range of characters (still gibberish).

Any help is much appreciated!

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/TransformerLensOrg/TransformerLens/issues) in the repo (**required**)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] Qwen gives nonsensical output when used on 'mps' #1008

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] Qwen gives nonsensical output when used on 'mps' #1008

Description

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions