-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong dimension of bert-base-italian-xxl vocabularies #7
Comments
Hi @f-wole thanks for that hint! Vocab file is correct, but in the config file there's a wrong vocab size. I'll fix that now :) |
Update on that: unfortunately, I used the vocab size value of 32102 in the configuration for training the model. In order to change fix I would need to re-train the model, which is currently out of my resources. However, the model is working and I also did all evaluations with the configuration that is deployed on the model hub. |
Yes, I saw that the model expects a vocabulary size of 32102 from the dimension of word_embeddings matrix: So are you suggesting it would be possible to use bert-base-italian-xxl with a vocabulary of size 31102? |
It is possible, I did evaluations with the NER example script in Hugging Face Transformers library for NER and PoS tagging. I just updated the README to mention the vocab and config size mismatch :) Thanks again for finding this! |
Hi, thanks again for these models! I was trying to use the bert-base-italian-xxl models, but I noticed that there is a discrepancy between the vocabulary size in the config.json file (32102) and the actual size of the vocabulary (31102). Is it possible that the wrong vocabulary is uploaded?
The text was updated successfully, but these errors were encountered: