-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Switch best bleu to chrf * Add requirements * Use the same version of sacrebleu and mtdata everywhere * Fix kind * Install requirements * Fix chrf score function * Update find-corpus * Skip mtdata flores datasets * Skip flores 200 * Skip WMT news * Clarify comment * Rename bestbleu to extract_best * Use full argument names * Add tests for extract best
- Loading branch information
Showing
15 changed files
with
1,462 additions
and
993 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
sacrebleu==2.4.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# | ||
# This file is autogenerated by pip-compile with Python 3.11 | ||
# by the following command: | ||
# | ||
# pip-compile pipeline/translate/requirements/extract_best.in | ||
# | ||
colorama==0.4.6 | ||
# via sacrebleu | ||
lxml==5.3.0 | ||
# via sacrebleu | ||
numpy==2.1.2 | ||
# via sacrebleu | ||
portalocker==2.10.1 | ||
# via sacrebleu | ||
regex==2024.9.11 | ||
# via sacrebleu | ||
sacrebleu==2.4.2 | ||
# via -r pipeline/translate/requirements/extract_best.in | ||
tabulate==0.9.0 | ||
# via sacrebleu |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
import os | ||
|
||
from fixtures import DataDir | ||
|
||
nbest = """0 ||| Реформа, направленная на выдвижение условий, идет слишком медленно. ||| F0= -9.21191 F1= -11.53 ||| -1.22059 | ||
0 ||| Реформа, направленная на выдвижение условий, проходит слишком медленно. ||| F0= -10.1025 F1= -11.1262 ||| -1.24908 | ||
0 ||| Реформа условий была слишком медленной. ||| F0= -6.67615 F1= -6.21271 ||| -1.28906 | ||
0 ||| Реформа, направленная на выдвижение условий, идет слишком медленными темпами. ||| F0= -12.3186 F1= -13.4513 ||| -1.28906 | ||
0 ||| Реформа, направленная на выдвижение условий, осуществляется слишком медленно. ||| F0= -9.85156 F1= -12.1434 ||| -1.29412 | ||
0 ||| Реформа до обусловленности была слишком медленной. ||| F0= -9.59259 F1= -6.63153 ||| -1.35026 | ||
0 ||| Реформа системы обусловленности была слишком медленной. ||| F0= -10.0087 F1= -7.58777 ||| -1.46484 | ||
0 ||| Реформа системы обусловленности идет слишком медленно. ||| F0= -9.89215 F1= -6.91754 ||| -1.52699 | ||
1 ||| Помощь по-прежнему носит фрагментарный характер, а доноры не координируют свою деятельность. ||| F0= -7.94812 F1= -8.49457 ||| -0.821875 | ||
1 ||| Помощь по-прежнему раздроблена, а доноры не координируют свою деятельность. ||| F0= -7.23773 F1= -8.77496 ||| -0.842928 | ||
1 ||| Помощь по-прежнему фрагментирована, а доноры не координируют свою деятельность. ||| F0= -8.4671 F1= -6.81989 ||| -0.848524 | ||
1 ||| Помощь остается раздробленной, а доноры - нескоординированными. ||| F0= -7.95831 F1= -8.09065 ||| -0.891493 | ||
1 ||| Помощь по-прежнему носит фрагментарный характер, а доноры не координируют свои действия. ||| F0= -9.28394 F1= -9.13013 ||| -0.920313 | ||
1 ||| Помощь по-прежнему раздроблена, а доноры не координируются. ||| F0= -6.3092 F1= -9.73718 ||| -0.943934 | ||
1 ||| Помощь по-прежнему раздроблена, а доноры не координируют свои усилия. ||| F0= -8.31525 F1= -9.74164 ||| -0.949836 | ||
1 ||| Помощь по-прежнему фрагментирована, а доноры не координируются. ||| F0= -7.51941 F1= -7.84265 ||| -0.959961""" | ||
|
||
refs = """Реформирование кондициональности проходит слишком медленно. | ||
Помощь по-прежнему оказывается фрагментарно, а действия доноров не координируются.""" | ||
|
||
|
||
def test_extract_best_chr(): | ||
data_dir = DataDir("test_extract_best") | ||
data_dir.create_file("file.1.nbest", nbest) | ||
data_dir.create_file("file.1.ref", refs) | ||
data_dir.mkdir("artifacts") | ||
env = { | ||
"TEST_ARTIFACTS": data_dir.path, | ||
"SRC": "en", | ||
"TRG": "ru", | ||
} | ||
|
||
data_dir.run_task("extract-best-en-ru-1/10", env=env) | ||
|
||
output_file = os.path.join(data_dir.path, "artifacts", "file.1.nbest.out") | ||
assert os.path.isfile(output_file) | ||
with open(output_file, "r") as f: | ||
output = f.read() | ||
assert ( | ||
output == "Реформа, направленная на выдвижение условий, проходит слишком медленно.\n" | ||
"Помощь по-прежнему носит фрагментарный характер, а доноры не координируют свои действия.\n" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.