Continued training of FlauBERT (with --reload_model) -- Question about vocab size #40

mcriggs · 2022-03-22T17:35:20Z

Hello. :)

I would like to use the "--reload_model" option with your train.py command to further train one of your pretrained FlauBERT models.

Upon trying to run train.py with the "--reload_model" option I got an error message saying that there was a "size mismatch" between the pretrained FlauBERT model and the adapted model I was trying to train.

The error message referred to a "shape torch.Size([67542]) from checkpoint". This was for the flaubert_base_uncased model. I assume that the number 67542 is the vocabulary size of flaubert-base-uncased.

In order to use the "--reload_model" option with your pretrained FlauBERT models, do I need to ensure that the vocabulary of my training data is identical to that of the pretrained model? If so, do you think that I could manage that simply by concatenating the "vocab" file of the pretrained model with my training data?

Thank you in advance for your help!

formiel · 2022-11-01T12:34:57Z

Hello @mcriggs !

I'm so sorry for the extremely late reply! I had been on a very long leave of several months and then when coming back to work, I have been overwhelmed by deadlines. I'm not sure if my response is still useful to you now but let me try anyway.

In order to use the --reload_model option, you need to have the same vocabulary size. If you want to skip loading the embedding layer, you can add the flag strict=False to this line. However, I think you should check carefully which layers are loaded when using this flag as it can skip other layers if there are some mismatches in the keys and dimensions etc.

I guess that simply concatenating the vocab file of the pre-trained model with the training data may not work because the resulting vocabulary is not guaranteed to have the same size as that of the pre-trained model and even if it has the same size, the indexing is likely to be different. But you can try to see if it works and how it performs compared to using random initialization for the embedding layer.

Please feel free to let me know if there is something else that I can be of help to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continued training of FlauBERT (with --reload_model) -- Question about vocab size #40

Continued training of FlauBERT (with --reload_model) -- Question about vocab size #40

mcriggs commented Mar 22, 2022

formiel commented Nov 1, 2022

Continued training of FlauBERT (with --reload_model) -- Question about vocab size #40

Continued training of FlauBERT (with --reload_model) -- Question about vocab size #40

Comments

mcriggs commented Mar 22, 2022

formiel commented Nov 1, 2022