Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in Marian logs parser #961

Open
eu9ene opened this issue Dec 17, 2024 · 2 comments
Open

Errors in Marian logs parser #961

eu9ene opened this issue Dec 17, 2024 · 2 comments
Assignees
Labels
bug Something is broken or not correct weights and biases Intergration with Weights and Biases

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented Dec 17, 2024

https://firefox-ci-tc.services.mozilla.com/tasks/JQuHsJS2R6-RuPR-O9XMXQ/runs/0/logs/public/logs/live.log

[task 2024-12-05T22:32:10.545Z] [tracking INFO] Fetching the experiment for task "JQuHsJS2R6-RuPR-O9XMXQ" to check if this is running in CI.
[task 2024-12-05T22:32:11.278Z] [tracking ERROR] Publication failed! The error is ignored to not break training, but it should be fixed.
[task 2024-12-05T22:32:11.278Z] Traceback (most recent call last):
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/cli/taskcluster.py", line 174, in main
[task 2024-12-05T22:32:11.278Z]     boot()
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/cli/taskcluster.py", line 165, in boot
[task 2024-12-05T22:32:11.278Z]     parser.run()
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 449, in run
[task 2024-12-05T22:32:11.278Z]     self.parse()
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 418, in parse
[task 2024-12-05T22:32:11.278Z]     self.parse_data(logs_iter)
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 384, in parse_data
[task 2024-12-05T22:32:11.278Z]     headers, text = next(logs_iter)
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 224, in _iter_log_entries
[task 2024-12-05T22:32:11.278Z]     tag = _join(marian_tags)
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 217, in _join
[task 2024-12-05T22:32:11.278Z]     return _join([_join(item) for item in seq])
[task 2024-12-05T22:32:11.278Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 216, in _join
[task 2024-12-05T22:32:11.278Z]     return "_".join(seq)

After adding a naive fix with checking for None it seems it doesn't track anything on the dashboards. All charts are empty for: https://wandb.ai/moz-translations/zh-en?nw=nwuserepavlov

Current training log: https://firefox-ci-tc.services.mozilla.com/tasks/Tiqin_UwSJSd8YDULbaV0g/runs/0/logs/live/public/logs/live.log

@eu9ene eu9ene added bug Something is broken or not correct weights and biases Intergration with Weights and Biases labels Dec 17, 2024
@eu9ene
Copy link
Collaborator Author

eu9ene commented Dec 17, 2024

Also, an issue on task preemption:

[task 2024-12-17T16:38:34.130Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/opustrainer/trainer.py", line 535, in _load_modifier
[task 2024-12-17T16:38:34.130Z] [tracking DEBUG] Marian header not found in: headers=[] text=File "/home/ubuntu/.local/lib/python3.10/site-packages/opustrainer/trainer.py", line 535, in _load_modifier
[task 2024-12-17T16:38:34.130Z]     raise CurriculumLoaderError(f"could not initialize modifier '{name}': {exc!s}") from exc
[task 2024-12-17T16:38:34.130Z] [tracking DEBUG] Marian header not found in: headers=[] text=raise CurriculumLoaderError(f"could not initialize modifier '{name}': {exc!s}") from exc
[task 2024-12-17T16:38:34.130Z] opustrainer.trainer.CurriculumLoaderError: could not initialize modifier 'Tags': PlaceholderTagModifier.__init__() got an unexpected keyword argument 'tag'
[task 2024-12-17T16:38:34.130Z] [tracking DEBUG] Marian header not found in: headers=[] text=opustrainer.trainer.CurriculumLoaderError: could not initialize modifier 'Tags': PlaceholderTagModifier.__init__() got an unexpected keyword argument 'tag'
[task 2024-12-17T16:38:34.130Z] [tracking INFO] Fetching the experiment for task "Tiqin_UwSJSd8YDULbaV0g" to check if this is running in CI.
[task 2024-12-17T16:38:34.308Z] [tracking ERROR] Publication failed! The error is ignored to not break training, but it should be fixed.
[task 2024-12-17T16:38:34.308Z] Traceback (most recent call last):
[task 2024-12-17T16:38:34.308Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 332, in parse_marian_context
[task 2024-12-17T16:38:34.308Z]     headers, text = next(logs_iter)
[task 2024-12-17T16:38:34.308Z] StopIteration
[task 2024-12-17T16:38:34.308Z] 
[task 2024-12-17T16:38:34.308Z] During handling of the above exception, another exception occurred:
[task 2024-12-17T16:38:34.308Z] 
[task 2024-12-17T16:38:34.308Z] Traceback (most recent call last):
[task 2024-12-17T16:38:34.308Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/cli/taskcluster.py", line 174, in main
[task 2024-12-17T16:38:34.308Z]     boot()
[task 2024-12-17T16:38:34.308Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/cli/taskcluster.py", line 165, in boot
[task 2024-12-17T16:38:34.308Z]     parser.run()
[task 2024-12-17T16:38:34.308Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 449, in run
[task 2024-12-17T16:38:34.308Z]     self.parse()
[task 2024-12-17T16:38:34.308Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 412, in parse
[task 2024-12-17T16:38:34.308Z]     self.parse_marian_context(copy)
[task 2024-12-17T16:38:34.308Z]   File "/home/ubuntu/.local/lib/python3.10/site-packages/translations_parser/parser.py", line 335, in parse_marian_context
[task 2024-12-17T16:38:34.308Z]     raise ValueError("Could not find a [marian] entry in the training log.")
[task 2024-12-17T16:38:34.308Z] ValueError: Could not find a [marian] entry in the training log.
[task 2024-12-17T16:38:36.270Z] Traceback (most recent call last):
[task 2024-12-17T16:38:36.270Z]   File "/home/ubuntu/tasks/task_173445166741057/checkouts/vcs/pipeline/train/train.py", line 475, in <module>
[task 2024-12-17T16:38:36.270Z]     main()
[task 2024-12-17T16:38:36.270Z]   File "/home/ubuntu/tasks/task_173445166741057/checkouts/vcs/pipeline/train/train.py", line 471, in main
[task 2024-12-17T16:38:36.270Z]     train_cli.run_training()
[task 2024-12-17T16:38:36.270Z]   File "/home/ubuntu/tasks/task_173445166741057/checkouts/vcs/pipeline/train/train.py", line 375, in run_training
[task 2024-12-17T16:38:36.271Z]     shutil.copy(
[task 2024-12-17T16:38:36.271Z]   File "/usr/lib/python3.10/shutil.py", line 417, in copy
[task 2024-12-17T16:38:36.297Z]     copyfile(src, dst, follow_symlinks=follow_symlinks)
[task 2024-12-17T16:38:36.297Z]   File "/usr/lib/python3.10/shutil.py", line 254, in copyfile
[task 2024-12-17T16:38:36.297Z]     with open(src, 'rb') as fsrc:
[task 2024-12-17T16:38:36.297Z] FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/tasks/task_173445166741057/artifacts/model.npz.best-chrf.npz'
[task 2024-12-17T16:38:36.388Z] Traceback (most recent call last):
[task 2024-12-17T16:38:36.388Z]   File "/home/ubuntu/tasks/task_173445166741057/./checkouts/vcs/taskcluster/scripts/pipeline/train_taskcluster.py", line 137, in <module>
[task 2024-12-17T16:38:36.388Z]     main(sys.argv[1:])
[task 2024-12-17T16:38:36.388Z]   File "/home/ubuntu/tasks/task_173445166741057/./checkouts/vcs/taskcluster/scripts/pipeline/train_taskcluster.py", line 133, in main
[task 2024-12-17T16:38:36.388Z]     subprocess.run([TRAINING_SCRIPT, *script_args], check=True)
[task 2024-12-17T16:38:36.388Z]   File "/usr/lib/python3.10/subprocess.py", line 526, in run
[task 2024-12-17T16:38:36.422Z]     raise CalledProcessError(retcode, process.args,
[task 2024-12-17T16:38:36.422Z] subprocess.CalledProcessError: Command '['/home/ubuntu/tasks/task_173445166741057/./checkouts/vcs/taskcluster/scripts/pipeline/train-taskcluster.sh', 'teacher', 'train', 'zh', 'en', '/home/ubuntu/tasks/task_173445166741057/fetches/corpus.tok-icu,/home/ubuntu/tasks/task_173445166741057/fetches/mono.tok-icu', '/home/ubuntu/tasks/task_173445166741057/fetches/devset', '/home/ubuntu/tasks/task_173445166741057/artifacts', 'chrf', '/home/ubuntu/tasks/task_173445166741057/fetches/corpus.aln.zst,/home/ubuntu/tasks/task_173445166741057/fetches/mono.aln.zst', '1', 'two-stage', 'None', 'None', 'None', '--early-stopping', '20']' returned non-zero exit status 1.
[fetches 2024-12-17T16:38:36.448Z] removing /home/ubuntu/tasks/task_173445166741057/fetches

https://firefox-ci-tc.services.mozilla.com/tasks/Tiqin_UwSJSd8YDULbaV0g/runs/1/logs/public/logs/live.log

@eu9ene eu9ene added the blocker Very important issue that blocks training label Dec 17, 2024
@eu9ene eu9ene self-assigned this Dec 17, 2024
@eu9ene
Copy link
Collaborator Author

eu9ene commented Dec 21, 2024

Hmm, I can see charts in W&B now. I think it was their bug. I tried parsing the logs locally and it works fine. I think the original issue was caused by some unexpected Marian logging, likely logging strings with empty alignments.

@eu9ene eu9ene removed the blocker Very important issue that blocks training label Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken or not correct weights and biases Intergration with Weights and Biases
Projects
None yet
Development

No branches or pull requests

1 participant