-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of base and large models #3
Comments
Hi @clarkkev, thanks for these interesting insights! Great to have these training curves 🤗 |
@clarkkev Do you have any idea about why there's a peak in disc_loss at the very beginning of the training curve? |
@yaolu From our own experiments, the generator starts out creating random predictions, then plateaus for a bit using the median prediction. So every [MASK] gets replaced with the token Here's a sample validation prediction at step 1000. The red-highlighted tokens are the generator's replacements, and the yellow-highlighted tokens are the tokens which the discriminator predicts as corrupted. Note that the discriminator simply predicts that every |
Hi, Thanks many in advance! |
How did you draw the graph? |
|
Hi,
I'm currently working on a new non-English ELECTRA model. Training on GPU seems to work and is running fine 🤗
Next steps would be to try model training on a TPU, so I would just like to ask if you can post the final loss of both base and large models (or even share the loss training curve) so that we have a kind of reference point when training own models 🤔
Thanks many in advance,
Stefan
The text was updated successfully, but these errors were encountered: