Strange model behavior when taking the softmax in the wrong dimension #132

Cloud299 · 2024-02-14T17:34:54Z

Line 64 in 37baab7

att = F.softmax(att, dim=-1)

I accidentally changed the softmax dimension to -2 instead of -1 and got incredibly low losses on both the training and validation set when using the tiny_shakespeare dataset. However, when generating from the model, I get very low-quality result. What is the explanation ?

My guess is that I'm somehow leaking information when taking the softmax in the wrong dimension, which may explain why the training loss is very low. However, I don't quite get why validation loss would also be low.

@karpathy Any idea why this is the case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange model behavior when taking the softmax in the wrong dimension #132

Strange model behavior when taking the softmax in the wrong dimension #132

Cloud299 commented Feb 14, 2024 •

edited

Loading

Strange model behavior when taking the softmax in the wrong dimension #132

Strange model behavior when taking the softmax in the wrong dimension #132

Comments

Cloud299 commented Feb 14, 2024 • edited Loading

Cloud299 commented Feb 14, 2024 •

edited

Loading