-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"OOM during optimization" when fine-tuning NLLB #4930
Comments
What were your hyperparameter settings? |
@FayZ676 I used default parameters from |
All hyperparameters:
|
For reference, I tried finetuning GPT-NeoX-20B on my setup (4x 3090's) and was told by the devs that I needed at least 13 Bytes of memory per parameter. The largest model I could successfully fine tune was up to the 2B parameter model. It looks like youre using the config for a 3.3B param model on one 3090 so you may just not have enough memory to fine tune model's larger than 600M???? I don't know for sure, so if someone can confirm the memory requirements for Fairseq that would be great actually. |
@zgerrard Hi, maybe you have step by step tutorial how to finetune 600M data model it will be really helpful for me? Could you share your finetune project via your git repository? |
@edvardasast Did you find any git repository for finetuning? |
unfortunately not :( |
@edvardasast Would you please share your whole steps on finetuning nllb? Thanks! |
I am getting the same error, it seems that it is using the vocab of my data instead of the vocab of the NLLB trained model. That makes model have a different number of parameters. |
Where is the code for fine-tuning the nllb model? ,thanks |
❓ Questions and Help
What is your question?
Hi, I am getting "OOM during optimization, irrecoverable" when trying to fine-tune the 3.3B parameter NLLB model.
Stack trace:
Any ideas? Any help will be greatly appreciated.
What have you tried?
Tried fine-tuning smaller models and only the 600M param. (smallest) model didn't cause the error above.
What's your environment?
The text was updated successfully, but these errors were encountered: