Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I run a trained model and can't run Test_ Hugging face_ Import. py #119

Open
linlong1314 opened this issue Jul 19, 2023 · 1 comment

Comments

@linlong1314
Copy link

How can I run a trained model? Include/ Projects/add/model. pt. Test_ Hugging face_ Import. py directly runs this test program and reports File ".\minGPT\master\mingpt model. py", line 202, in from_ Pre trained
Assert len (keys)==len (sd)
(act): NewGELUActivation(

@MkuuWaUjinga
Copy link

This is because of the custom implementation of multi-head attention. The CausalSelfAttention module registers a buffer to ensure that attention is only applied to tokens on the left of the input sequence. However, state_dict() returns buffers as part of the model's state. This is implemented differently in Pytorch's native module M̀ultiheadAttention and the mask is not part of the model's state dict. That's why the assertion fails.

You can fix that by adding the flag persistent=False when the buffer is registered in the __init__ function of the CausalSelfAttention module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants