You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use OpusTrainer to augment the data during training, both at teacher training, and student training. There is a gap in student training, and it would be good to understand the effects of data augmentation on the training. It's particularly important to compare between the augmented and clean flores.
Language pair: TODO
Experiment Splits
Strategies
flores-devtest
flores-aug-devtest
Training Time
Run augmentation
Disable augmentation
Two stage: aug, no aug
Two stage: no aug, aug
Hypothesis
Augmentation increases student training time. Augmentation behavior may not be learned without augmented data.
The text was updated successfully, but these errors were encountered:
I'm going to run a quick experiment for en-ru from the same branch to see the side-by-side effect of disabled augmentation. We don't have to wait until the full training, a couple of days we'll be enough to see the difference in validation curves.
I completed training for en-ru student with no augmentation and evals on non-augmented dataset are similar and worse on the augmented ones, so we can conclude that data augmentation doesn't affect distillation quality gap.
An experiment for #231
We use OpusTrainer to augment the data during training, both at teacher training, and student training. There is a gap in student training, and it would be good to understand the effects of data augmentation on the training. It's particularly important to compare between the augmented and clean flores.
Language pair: TODO
Experiment Splits
Hypothesis
Augmentation increases student training time. Augmentation behavior may not be learned without augmented data.
The text was updated successfully, but these errors were encountered: