Figure out the behavior of OpusTrainer augmentation on student distillation gap #773

gregtatum · 2024-07-29T20:36:51Z

An experiment for #231

We use OpusTrainer to augment the data during training, both at teacher training, and student training. There is a gap in student training, and it would be good to understand the effects of data augmentation on the training. It's particularly important to compare between the augmented and clean flores.

Language pair: TODO

Experiment Splits

Strategies	flores-devtest	flores-aug-devtest	Training Time
Run augmentation
Disable augmentation
Two stage: aug, no aug
Two stage: no aug, aug

Hypothesis

Augmentation increases student training time. Augmentation behavior may not be learned without augmented data.

gregtatum · 2024-07-29T20:55:28Z

We'll do #771 and #772 first.

eu9ene · 2024-10-03T22:34:43Z

I'm going to run a quick experiment for en-ru from the same branch to see the side-by-side effect of disabled augmentation. We don't have to wait until the full training, a couple of days we'll be enough to see the difference in validation curves.

eu9ene · 2024-10-09T20:12:42Z

I completed training for en-ru student with no augmentation and evals on non-augmented dataset are similar and worse on the augmented ones, so we can conclude that data augmentation doesn't affect distillation quality gap.

No augmentation student and evals: https://firefox-ci-tc.services.mozilla.com/tasks/groups/FQ0mxIvFSMiLakX3uxk0uA
Student with augmentation: https://firefox-ci-tc.services.mozilla.com/tasks/groups/CbqKRgg6QKuoGWa8n634Eg

Strategies	flores-devtest	flores-aug-mix	flores-aug-upper	Training Time
Run augmentation	0.8562	0.8493	0.7942	15 days
Disable augmentation	0.855	0.7885	0.3944	5 days

gregtatum added the experiment A training experiment with hypothesis and results label Jul 29, 2024

gregtatum assigned gregtatum and unassigned gregtatum Jul 29, 2024

eu9ene self-assigned this Oct 9, 2024

eu9ene closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out the behavior of OpusTrainer augmentation on student distillation gap #773

Figure out the behavior of OpusTrainer augmentation on student distillation gap #773

gregtatum commented Jul 29, 2024

gregtatum commented Jul 29, 2024

eu9ene commented Oct 3, 2024

eu9ene commented Oct 9, 2024

Figure out the behavior of OpusTrainer augmentation on student distillation gap #773

Figure out the behavior of OpusTrainer augmentation on student distillation gap #773

Comments

gregtatum commented Jul 29, 2024

Experiment Splits

Hypothesis

gregtatum commented Jul 29, 2024

eu9ene commented Oct 3, 2024

eu9ene commented Oct 9, 2024