Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continued training of FlauBERT (with --reload_model) -- Question about vocab size #40

Open
mcriggs opened this issue Mar 22, 2022 · 1 comment

Comments

@mcriggs
Copy link

mcriggs commented Mar 22, 2022

Hello. :)

I would like to use the "--reload_model" option with your train.py command to further train one of your pretrained FlauBERT models.

Upon trying to run train.py with the "--reload_model" option I got an error message saying that there was a "size mismatch" between the pretrained FlauBERT model and the adapted model I was trying to train.

The error message referred to a "shape torch.Size([67542]) from checkpoint". This was for the flaubert_base_uncased model. I assume that the number 67542 is the vocabulary size of flaubert-base-uncased.

In order to use the "--reload_model" option with your pretrained FlauBERT models, do I need to ensure that the vocabulary of my training data is identical to that of the pretrained model? If so, do you think that I could manage that simply by concatenating the "vocab" file of the pretrained model with my training data?

Thank you in advance for your help!

@formiel
Copy link
Contributor

formiel commented Nov 1, 2022

Hello @mcriggs !

I'm so sorry for the extremely late reply! I had been on a very long leave of several months and then when coming back to work, I have been overwhelmed by deadlines. I'm not sure if my response is still useful to you now but let me try anyway.

In order to use the --reload_model option, you need to have the same vocabulary size. If you want to skip loading the embedding layer, you can add the flag strict=False to this line. However, I think you should check carefully which layers are loaded when using this flag as it can skip other layers if there are some mismatches in the keys and dimensions etc.

I guess that simply concatenating the vocab file of the pre-trained model with the training data may not work because the resulting vocabulary is not guaranteed to have the same size as that of the pre-trained model and even if it has the same size, the indexing is likely to be different. But you can try to see if it works and how it performs compared to using random initialization for the embedding layer.

Please feel free to let me know if there is something else that I can be of help to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants