Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange model behavior when taking the softmax in the wrong dimension #132

Open
Cloud299 opened this issue Feb 14, 2024 · 0 comments
Open

Comments

@Cloud299
Copy link

Cloud299 commented Feb 14, 2024

att = F.softmax(att, dim=-1)

I accidentally changed the softmax dimension to -2 instead of -1 and got incredibly low losses on both the training and validation set when using the tiny_shakespeare dataset. However, when generating from the model, I get very low-quality result. What is the explanation ?

My guess is that I'm somehow leaking information when taking the softmax in the wrong dimension, which may explain why the training loss is very low. However, I don't quite get why validation loss would also be low.

image

@karpathy Any idea why this is the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant