Skip to content

Figure out the behavior of OpusTrainer augmentation on student distillation gap #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gregtatum opened this issue Jul 29, 2024 · 3 comments
Assignees
Labels
experiment A training experiment with hypothesis and results

Comments

@gregtatum
Copy link
Member

An experiment for #231

We use OpusTrainer to augment the data during training, both at teacher training, and student training. There is a gap in student training, and it would be good to understand the effects of data augmentation on the training. It's particularly important to compare between the augmented and clean flores.

Language pair: TODO

Experiment Splits

Strategies flores-devtest flores-aug-devtest Training Time
Run augmentation
Disable augmentation
Two stage: aug, no aug
Two stage: no aug, aug

Hypothesis

Augmentation increases student training time. Augmentation behavior may not be learned without augmented data.

@gregtatum gregtatum added the experiment A training experiment with hypothesis and results label Jul 29, 2024
@gregtatum gregtatum assigned gregtatum and unassigned gregtatum Jul 29, 2024
@gregtatum
Copy link
Member Author

We'll do #771 and #772 first.

@eu9ene
Copy link
Collaborator

eu9ene commented Oct 3, 2024

I'm going to run a quick experiment for en-ru from the same branch to see the side-by-side effect of disabled augmentation. We don't have to wait until the full training, a couple of days we'll be enough to see the difference in validation curves.

@eu9ene eu9ene self-assigned this Oct 9, 2024
@eu9ene
Copy link
Collaborator

eu9ene commented Oct 9, 2024

I completed training for en-ru student with no augmentation and evals on non-augmented dataset are similar and worse on the augmented ones, so we can conclude that data augmentation doesn't affect distillation quality gap.

No augmentation student and evals: https://firefox-ci-tc.services.mozilla.com/tasks/groups/FQ0mxIvFSMiLakX3uxk0uA
Student with augmentation: https://firefox-ci-tc.services.mozilla.com/tasks/groups/CbqKRgg6QKuoGWa8n634Eg

Strategies flores-devtest flores-aug-mix flores-aug-upper Training Time
Run augmentation 0.8562 0.8493 0.7942 15 days
Disable augmentation 0.855 0.7885 0.3944 5 days
Screenshot 2024-10-09 at 12 58 17 PM

@eu9ene eu9ene closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experiment A training experiment with hypothesis and results
Projects
None yet
Development

No branches or pull requests

2 participants