Fix attention calculation on mps for torch 2.8.0 #1068

BrownianNotion · 2025-09-27T15:54:40Z

Description

Hi I'm new here, feedback is welcome/let me know if I've missed anything!

Due to a bug in PyTorch 2.8.0 F.linear for mps pytorch/pytorch#161640, the lines below

out = F.linear(
                    z.reshape(z.shape[0], z.shape[1], self.cfg.d_head * self.cfg.n_heads),
                    w,
                    self.b_O,
                )

from https://github.com/TransformerLensOrg/TransformerLens/blob/main/transformer_lens/components/abstract_attention.py#L302-L306
produce incorrect attention outputs on mps. Cpu works fine. I'm on Mac Sequoia 15.6.1.

I've added a unit test which reproduces the issue in this commit f903629.

To fix this, I have replaced F.linear with einops.einsum which is also more consistent with the rest of the class.

Related issues: #1008 #1062

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

This reverts commit d4872f0.

BrownianNotion · 2025-09-27T18:55:03Z

Alternate simpler/less invasive fix - just add .contiguous() to the reshaped z and w before the F.linear calculation.

BrownianNotion added 4 commits September 27, 2025 16:08

TEMP: pytorch 2.8.0 make unit test fail

d4872f0

Test: Add unit test to show mps vs cpu diff

f903629

Fix: Replace F.linear in attn calc with einops.einsum

3aab2c9

Revert "TEMP: pytorch 2.8.0 make unit test fail"

9fc8141

This reverts commit d4872f0.

BrownianNotion changed the title ~~Fix attn mps~~ Fix attention calculation on mps Sep 27, 2025

BrownianNotion changed the title ~~Fix attention calculation on mps~~ Fix attention calculation on mps for torch 2.8.0 Sep 27, 2025

BrownianNotion mentioned this pull request Oct 4, 2025

[1.1 Transformer from scratch] MPS vs CPU divergence in GPT‑2 next-token predictions and generation on macOS callummcdougall/ARENA_3.0#264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attention calculation on mps for torch 2.8.0 #1068

Fix attention calculation on mps for torch 2.8.0 #1068

Uh oh!

BrownianNotion commented Sep 27, 2025

Uh oh!

BrownianNotion commented Sep 27, 2025

Uh oh!

Uh oh!

Fix attention calculation on mps for torch 2.8.0 #1068

Are you sure you want to change the base?

Fix attention calculation on mps for torch 2.8.0 #1068

Uh oh!

Conversation

BrownianNotion commented Sep 27, 2025

Description

Type of change

Checklist:

Uh oh!

BrownianNotion commented Sep 27, 2025

Uh oh!

Uh oh!