You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You mention GPT in Figure 3 when comparing the Pareto front across different models("AR models of the same size"). May I ask if this is a pre-trained GPT (e.g. GPT2-small) finetuned on the LM1B dataset, or a model with GPT architecture trained from scrach on the LM1B training set?
The text was updated successfully, but these errors were encountered:
Thank you for your question! We include both models in Figure 3. The red curve, which is rather close to our DiffusionBERT stands for an AR model trained from scratch and the green one for finetuned GPT2. In general, DiffusionBERT still falls behind pretrained AR models in terms of generation quality.
Dear authors,
Thanks for open-sourcing your wonderful work.
You mention GPT in Figure 3 when comparing the Pareto front across different models("AR models of the same size"). May I ask if this is a pre-trained GPT (e.g. GPT2-small) finetuned on the LM1B dataset, or a model with GPT architecture trained from scrach on the LM1B training set?
The text was updated successfully, but these errors were encountered: