Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish evaluation metrics #598

Merged
merged 15 commits into from
May 22, 2024
Merged

Publish evaluation metrics #598

merged 15 commits into from
May 22, 2024

Conversation

vrigal
Copy link
Contributor

@vrigal vrigal commented May 15, 2024

Based on #589 (required for valid evaluation task naming).

Closes #519

Based on Bastien's work (#558)

[skip ci]

@vrigal vrigal force-pushed the publish-eval branch 2 times, most recently from 3e1ba35 to cf387a3 Compare May 16, 2024 08:08
@vrigal
Copy link
Contributor Author

vrigal commented May 16, 2024

I tried testing by setting {"training_config": { "wandb-publication": True }} in the training configuration (taskcluster.translations_taskgraph.parameters) and change parameters to avoid caching, all in a temporary commit (TRASHME).

However no train nor evaluation task is started in Taskcluster (marked unscheduled). @eu9ene do you have right to start this task for example https://firefox-ci-tc.services.mozilla.com/tasks/ce0zK1kgSxC_cwZesDXPLQ ? (if it works it should publish that eval metric to a new group ci_b_lxggt6TXCnpi1qsjeIUQ on moz-translations workspace project ru-en).

@vrigal vrigal marked this pull request as ready for review May 16, 2024 08:52
@vrigal vrigal requested a review from a team as a code owner May 16, 2024 08:52
@vrigal vrigal requested a review from gbrownmozilla May 16, 2024 08:52
@bhearsum bhearsum requested review from eu9ene and removed request for a team and gbrownmozilla May 16, 2024 14:20
@eu9ene
Copy link
Collaborator

eu9ene commented May 16, 2024

I tried testing by setting {"training_config": { "wandb-publication": True }} in the training configuration (taskcluster.translations_taskgraph.parameters) and change parameters to avoid caching, all in a temporary commit (TRASHME).

However no train nor evaluation task is started in Taskcluster (marked unscheduled). @eu9ene do you have right to start this task for example https://firefox-ci-tc.services.mozilla.com/tasks/ce0zK1kgSxC_cwZesDXPLQ ? (if it works it should publish that eval metric to a new group ci_b_lxggt6TXCnpi1qsjeIUQ on moz-translations workspace project ru-en).

The train-backwards task is failed, that's why it doesn't start. It's something about: Publication failed: argument --wandb-project: conflicting option string: --wandb-project

@vrigal vrigal force-pushed the publish-eval branch 3 times, most recently from 1660452 to 8543142 Compare May 17, 2024 12:15
@vrigal vrigal force-pushed the publish-eval branch 2 times, most recently from d630a0e to ba1b9cd Compare May 17, 2024 14:34
@eu9ene
Copy link
Collaborator

eu9ene commented May 17, 2024

Depends on #611 to run CI

@eu9ene
Copy link
Collaborator

eu9ene commented May 17, 2024

We should not forget to check that the newly added COMET metric is being published.

Copy link
Collaborator

@eu9ene eu9ene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely work after the bug fix but we should refactor the code reuse without adding arguments to another script. Also please double check that the recently added COMET metric is working. We'll need green CI and everything published on W&B to verify that this works. Thanks for pushing on this one, we really need it for the release!

taskcluster/kinds/evaluate-quantized/kind.yml Outdated Show resolved Hide resolved
taskcluster/kinds/evaluate-teacher-ensemble/kind.yml Outdated Show resolved Hide resolved
pipeline/eval/eval.py Show resolved Hide resolved
@vrigal
Copy link
Contributor Author

vrigal commented May 21, 2024

Also please double check that the recently added COMET metric is working.

This metric was not specified. I will open a new Issue/PR for this.

It is actually ignored by the online publication. Actual support for .metrics files will not work (from old experiments and offline Taskcluster uploads). It will raise AssertionError: file must contain exactly 2 float values.

)
logger.info("Initializing Weight & Biases client")
# Allow publishing metrics as a table on existing runs (i.e. previous trainings)
wandb.open(resume=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also benefit from #610.

@vrigal vrigal force-pushed the publish-eval branch 2 times, most recently from 9e92d78 to 9d6cfa2 Compare May 21, 2024 09:36
@vrigal
Copy link
Contributor Author

vrigal commented May 21, 2024

Evaluation data has been published to W&B: https://wandb.ai/moz-translations/ru-en/groups/ci_bVp4cnGJSgCeY2pK5aKK0A/workspace

I found two other things:

  • Teacher-1 is not correctly parsed by the regex (label evaluate-teacher-flores-devtest-ru-en-1), because it detects ru-en-1 as part of the dataset (instead of the language at the end of the label). In case that extra language part is always present, it will be a 1 character patch.
  • I noticed that simple logs can be displayed as bar charts on W&B, once the grouping option is disabled (see our custom workspace). That would be a great option in order to remove Tables/custom charts for evaluation metrics and homogenize all charts in the future.

⚠️ note: Commit 9d6cfa2 must be dropped before merging

@eu9ene
Copy link
Collaborator

eu9ene commented May 21, 2024

@vrigal things don't look right here: https://wandb.ai/moz-translations/ru-en/groups/ci_bVp4cnGJSgCeY2pK5aKK0A/workspace. Evals use different run names than training:
Screenshot 2024-05-21 at 3 49 46 PM

Teacher-1 is not correctly parsed by the regex (label evaluate-teacher-flores-devtest-ru-en-1), because it detects ru-en-1 as part of the dataset (instead of the language at the end of the label). In case that extra language part is always present, it will be a 1 character patch.

Yes, I see, this is a must-fix before merging. Please also add "sacrebleu_aug-upper_wmt19" to https://github.com/mozilla/firefox-translations-training/blob/f5a6af83849f08cce3e7b8b9d06cc2a0456f2c68/taskcluster/translations_taskgraph/parameters.py#L91 so that we can tests that the augmentation part of the dataset is parsed correctly.

@vrigal
Copy link
Contributor Author

vrigal commented May 22, 2024

Publication should be fine now (flores & sacrebleu): https://wandb.ai/moz-translations/ru-en/groups/ci_Z45R3e22TWCzyqQ5UFdWdQ/workspace

I updated the regex so it supports labels ending with -1, merged run names in the label parser (backward -> backwards and finetuned -> finetune) and dropped the temporary commit used to trigger the complete CI.

@vrigal vrigal mentioned this pull request May 22, 2024
@eu9ene
Copy link
Collaborator

eu9ene commented May 22, 2024

The issue in CI is #549

@eu9ene eu9ene merged commit 8a1d8ef into mozilla:main May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Publish evals from Taskcluster
3 participants