Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese is missing in OpusCleaner #943

Open
Tracked by #425
eu9ene opened this issue Nov 25, 2024 · 1 comment
Open
Tracked by #425

Japanese is missing in OpusCleaner #943

eu9ene opened this issue Nov 25, 2024 · 1 comment
Assignees
Labels
language-coverage Issues related to covering specific languages

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented Nov 25, 2024

[0/5:alpha_ratio] usage: alpha_ratio.py [-h] [--ratio-words-src RATIO_WORDS_SRC]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio]                       [--ratio-words-trg RATIO_WORDS_TRG]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio]                       [--ratio-alpha-src RATIO_ALPHA_SRC]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio]                       [--ratio-alpha-trg RATIO_ALPHA_TRG]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio]                       [--src-lang {ar,bg,bs,bn,ca,cs,da,de,en,el,es,et,eu,fi,fr,ga,gl,hi,hr,hu,hy,id,is,it,ko,lt,lv,mt,nb,nl,no,nn,pl,pt,ro,ru,sk,sl,sr,sv,tr,uk,zh,vi}]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio]                       [--trg-lang {ar,bg,bs,bn,ca,cs,da,de,en,el,es,et,eu,fi,fr,ga,gl,hi,hr,hu,hy,id,is,it,ko,lt,lv,mt,nb,nl,no,nn,pl,pt,ro,ru,sk,sl,sr,sv,tr,uk,zh,vi}]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio]                       [--debug]
[task 2024-11-25T19:14:04.443Z] [0/5:alpha_ratio] alpha_ratio.py: error: argument --src-lang: invalid choice: 'ja' (choose from 'ar', 'bg', 'bs', 'bn', 'ca', 'cs', 'da', 'de', 'en', 'el', 'es', 'et', 'eu', 'fi', 'fr', 'ga', 'gl', 'hi', 'hr', 'hu', 'hy', 'id', 'is', 'it', 'ko', 'lt', 'lv', 'mt', 'nb', 'nl', 'no', 'nn', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl', 'sr', 'sv', 'tr', 'uk', 'zh', 'vi')
[task 2024-11-25T19:14:04.454Z] [0/4:fix_wiki] Traceback (most recent call last):
[task 2024-11-25T19:14:04.454Z] [0/4:fix_wiki]   File "/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/filters/./fix_wiki.py", line 97, in <module>
[task 2024-11-25T19:14:04.454Z] [0/4:fix_wiki]     print("\t".join(fields))
[task 2024-11-25T19:14:04.454Z] [0/4:fix_wiki] BrokenPipeError: [Errno 32] Broken pipe
[task 2024-11-25T19:14:04.462Z] [0/3:max_length] Traceback (most recent call last):
[task 2024-11-25T19:14:04.462Z] [0/3:max_length]   File "/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/filters/./max_length.py", line 47, in <module>
[task 2024-11-25T19:14:04.462Z] [0/3:max_length]     clean_parallel(args.max_length, args.min_length, args.debug)
[task 2024-11-25T19:14:04.462Z] [0/3:max_length]   File "/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/filters/./max_length.py", line 40, in clean_parallel
[task 2024-11-25T19:14:04.462Z] [0/3:max_length]     stdout.write(line)
[task 2024-11-25T19:14:04.467Z] [0/3:max_length] BrokenPipeError: [Errno 32] Broken pipe
[task 2024-11-25T19:14:04.767Z] Traceback (most recent call last):
[task 2024-11-25T19:14:04.767Z]   File "/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/clean.py", line 410, in run_pipeline
[task 2024-11-25T19:14:04.767Z]     with logging.span('run_pipeline_batch', batch_index=batch_index), \
[task 2024-11-25T19:14:04.767Z]   File "/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/logging.py", line 270, in __exit__
[task 2024-11-25T19:14:04.767Z]     super().__exit__(typ, value, traceback)
[task 2024-11-25T19:14:04.767Z]   File "/builds/worker/.local/lib/python3.10/site-packages/opuscleaner/clean.py", line 234, in __exit__
[task 2024-11-25T19:14:04.767Z]     raise RuntimeError(f"Child {problem_child.name} (pid {problem_child.process.pid}) exited with {problem_child.process.returncode}")
[task 2024-11-25T19:14:04.767Z] RuntimeError: Child 0/5:alpha_ratio (pid 334) exited with 2
@eu9ene eu9ene added the language-coverage Issues related to covering specific languages label Dec 17, 2024
@eu9ene
Copy link
Collaborator Author

eu9ene commented Dec 17, 2024

I added it in OpusCleaner, waiting for merging: hplt-project/OpusCleaner#163

@eu9ene eu9ene self-assigned this Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language-coverage Issues related to covering specific languages
Projects
None yet
Development

No branches or pull requests

1 participant