-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish evaluation metrics #598
Conversation
3e1ba35
to
cf387a3
Compare
I tried testing by setting However no train nor evaluation task is started in Taskcluster (marked unscheduled). @eu9ene do you have right to start this task for example https://firefox-ci-tc.services.mozilla.com/tasks/ce0zK1kgSxC_cwZesDXPLQ ? (if it works it should publish that eval metric to a new group |
The train-backwards task is failed, that's why it doesn't start. It's something about: |
1660452
to
8543142
Compare
d630a0e
to
ba1b9cd
Compare
Depends on #611 to run CI |
We should not forget to check that the newly added COMET metric is being published. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will likely work after the bug fix but we should refactor the code reuse without adding arguments to another script. Also please double check that the recently added COMET metric is working. We'll need green CI and everything published on W&B to verify that this works. Thanks for pushing on this one, we really need it for the release!
This metric was not specified. I will open a new Issue/PR for this. It is actually ignored by the online publication. Actual support for |
pipeline/eval/eval.py
Outdated
) | ||
logger.info("Initializing Weight & Biases client") | ||
# Allow publishing metrics as a table on existing runs (i.e. previous trainings) | ||
wandb.open(resume=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also benefit from #610.
9e92d78
to
9d6cfa2
Compare
Evaluation data has been published to W&B: https://wandb.ai/moz-translations/ru-en/groups/ci_bVp4cnGJSgCeY2pK5aKK0A/workspace I found two other things:
|
@vrigal things don't look right here: https://wandb.ai/moz-translations/ru-en/groups/ci_bVp4cnGJSgCeY2pK5aKK0A/workspace. Evals use different run names than training:
Yes, I see, this is a must-fix before merging. Please also add |
Publication should be fine now (flores & sacrebleu): https://wandb.ai/moz-translations/ru-en/groups/ci_Z45R3e22TWCzyqQ5UFdWdQ/workspace I updated the regex so it supports labels ending with |
The issue in CI is #549 |
Based on #589 (required for valid evaluation task naming).
Closes #519
Based on Bastien's work (#558)
[skip ci]