diff --git a/README.md b/README.md index 8fc8fc5167..54ec693ee0 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ Here are some of the features we support: - [**Instruction following**](https://nvidia.github.io/NeMo-Skills/evaluation/instruction-following): e.g. [ifbench](https://nvidia.github.io/NeMo-Skills/evaluation/instruction-following/#ifbench), [ifeval](https://nvidia.github.io/NeMo-Skills/evaluation/instruction-following/#ifeval) - [**Long-context**](https://nvidia.github.io/NeMo-Skills/evaluation/long-context): e.g. [ruler](https://nvidia.github.io/NeMo-Skills/evaluation/long-context/#ruler), [mrcr](https://nvidia.github.io/NeMo-Skills/evaluation/long-context/#mrcr), [aalcr](https://nvidia.github.io/NeMo-Skills/evaluation/long-context/#aalcr) - [**Tool-calling**](https://nvidia.github.io/NeMo-Skills/evaluation/tool-calling): e.g. [bfcl_v3](https://nvidia.github.io/NeMo-Skills/evaluation/tool-calling/#bfcl_v3) - - [**Multilingual**](https://nvidia.github.io/NeMo-Skills/evaluation/multilingual): e.g. [mmlu-prox](https://nvidia.github.io/NeMo-Skills/evaluation/multilingual/#mmlu-prox) + - [**Multilingual**](https://nvidia.github.io/NeMo-Skills/evaluation/multilingual): e.g. [mmlu-prox](https://nvidia.github.io/NeMo-Skills/evaluation/multilingual/#mmlu-prox), [FLORES-200](https://nvidia.github.io/NeMo-Skills/evaluation/multilingual/#FLORES-200), [wmt24pp](https://nvidia.github.io/NeMo-Skills/evaluation/multilingual/#wmt24pp) - Easily parallelize each evaluation across many slurm jobs, self-host LLM judges, bring your own prompts or change benchmark configuration in any other way. - [Model training](https://nvidia.github.io/NeMo-Skills/pipelines/training): Train models using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner/), [NeMo-RL](https://github.com/NVIDIA/NeMo-RL/) or [verl](https://github.com/volcengine/verl). diff --git a/docs/evaluation/index.md b/docs/evaluation/index.md index f5b6f3f345..41fdadfe28 100644 --- a/docs/evaluation/index.md +++ b/docs/evaluation/index.md @@ -9,7 +9,7 @@ We support many popular benchmarks and it's easy to add new in the future. The f - [**Instruction following**](./instruction-following.md): e.g. [ifbench](./instruction-following.md#ifbench), [ifeval](./instruction-following.md#ifeval) - [**Long-context**](./long-context.md): e.g. [ruler](./long-context.md#ruler), [mrcr](./long-context.md#mrcr) - [**Tool-calling**](./tool-calling.md): e.g. [bfcl_v3](./tool-calling.md#bfcl_v3) -- [**Multilingual**](./multilingual.md): e.g. [mmlu-prox](./multilingual.md#mmlu-prox) +- [**Multilingual**](./multilingual.md): e.g. [mmlu-prox](./multilingual.md#mmlu-prox), [flores-200](./multilingual.md#FLORES-200), [wmt24pp](./multilingual.md#wmt24pp) See [nemo_skills/dataset](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset) where each folder is a benchmark we support. @@ -246,4 +246,4 @@ To create a new benchmark follow this process: prompt config in `GENERATION_ARGS` and evaluation / metric parameters. But if extra customization is needed for the generation, you can provide a fully custom generation module. See [scicode](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset/scicode/__init__.py) or [swe-bench](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset/swe-bench/__init__.py) for examples of this. 4. Create a new [evaluation class](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/evaluation/evaluator/__init__.py) (if cannot re-use existing one). -5. Create a new [metrics class](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/evaluation/metrics/map_metrics.py) ( if cannot re-use existing one). \ No newline at end of file +5. Create a new [metrics class](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/evaluation/metrics/map_metrics.py) ( if cannot re-use existing one). diff --git a/docs/evaluation/long-context.md b/docs/evaluation/long-context.md index 52c3aae59e..506bfb10e9 100644 --- a/docs/evaluation/long-context.md +++ b/docs/evaluation/long-context.md @@ -49,4 +49,4 @@ ns eval \ The results, including per-category scores, are stored in metrics.json. Detailed breakdowns by category and sequence length are also available via ``` ns summarize_results --cluster= -``` \ No newline at end of file +``` diff --git a/docs/evaluation/multilingual.md b/docs/evaluation/multilingual.md index 6ace23aafd..bebe25d981 100644 --- a/docs/evaluation/multilingual.md +++ b/docs/evaluation/multilingual.md @@ -1,6 +1,6 @@ # Multilingual -Our multilingual benchmarks cover things like multilingual reasoning as well as machine translation (to be added). +Our multilingual benchmarks cover things like multilingual reasoning as well as machine translation. All benchmarks in this category will have an extra `--language` argument with its associated `ns prepare` command, which allows you to choose which language(s) of the benchmark to run. Once prepared, the `ns eval` command will run on all languages prepared, and the summarized results generated with `ns eval` will include per-language breakdowns. @@ -9,7 +9,7 @@ Once prepared, the `ns eval` command will run on all languages prepared, and the ### mmlu-prox -- Benchmark is defined in [`nemo_skills/dataset/mmlu-pro/__init__.py`](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset/mmlu-prox/__init__.py) +- Benchmark is defined in [`nemo_skills/dataset/mmlu-prox/__init__.py`](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset/mmlu-prox/__init__.py) - Original benchmark source is [here](https://huggingface.co/datasets/li-lab/MMLU-ProX). Our evaluation template and answer extraction mechanism tries to match the configration in [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/mmlu_prox). @@ -68,4 +68,150 @@ Some reference numbers for reference and commands for reproduction: ++inference.temperature=0.6 \ ++inference.top_k=20 \ ++inference.tokens_to_generate=38912 - ``` \ No newline at end of file + ``` + +### FLORES-200 + +- Benchmark is defined in [`nemo_skills/dataset/flores200/__init__.py`](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset/flores200/__init__.py) +- Original benchmark source is [here](https://huggingface.co/datasets/openlanguagedata/flores_plus). + +Some reference numbers for devtest split (xx corresponds to average over 5 languages: de, es, fr, it, ja): + +| Model | en->xx | xx->en | xx->xx | +|:-----------------------|------:|------:|------:| +| Nemotron-NanoV2-9B-v2 | 32.5 | 34 | 25.9 | +| Qwen3-8B | 31.5 | 34.6 | 25.7 | +| Qwen3-30B-A3B | 33.3 | 35.5 | 27.1 | +| gpt-oss-20B | 32.4 | 34.1 | 25 | + +=== "Nemotron-NanoV2-9B-v2" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=NVIDIA/Nemotron-Nano-9B-v2 \ + --benchmarks flores200 \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=devtest \ + ++inference.tokens_to_generate=512 + ++system_message='/no_think' + ``` + +=== "Qwen3-8B" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=Qwen/Qwen3-8B \ + --benchmarks flores200 \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=devtest \ + ++inference.tokens_to_generate=512 + ++prompt_suffix='/no_think' + ``` + +=== "Qwen3-30B-A3B" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=Qwen/Qwen3-30B-A3B \ + --benchmarks flores200 \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=devtest \ + ++inference.tokens_to_generate=512 + ++prompt_suffix='/no_think' + ``` + +=== "gpt-oss-20B" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=openai/gpt-oss-20b \ + --benchmarks flores200 \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=devtest \ + ++inference.tokens_to_generate=2048 + ``` + +### wmt24pp + +- Benchmark is defined in [`nemo_skills/dataset/wmt24pp/__init__.py`](https://github.com/NVIDIA/NeMo-Skills/blob/main/nemo_skills/dataset/wmt24pp/__init__.py) +- Original benchmark source is [here](https://huggingface.co/datasets/google/wmt24pp). + +Some reference numbers for test split (xx corresponds to average over 5 languages: de, es, fr, it, ja): + +| Model | en->de | en->es | en->fr | en->it | en->ja | en->xx | +|:-----------------------|------:|------:|------:|------:|------:|------:| +| Nemotron-NanoV2-9B-v2 | 25.3 | 37.7 | 33.4 | 33.8 | 20.9 | 30.2 | +| Qwen3-8B | 26.2 | 38.5 | 33.1 | 33.1 | 21.7 | 30.5 | +| Qwen3-30B-A3B | 28.5 | 40 | 35.1 | 36 | 23.2 | 32.5 | +| gpt-oss-20B | 27.3 | 42.3 | 32.8 | 34.9 | 25.2 | 32.5 | + +=== "Nemotron-NanoV2-9B-v2" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=NVIDIA/Nemotron-Nano-9B-v2 \ + --benchmarks wmt24pp \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=test \ + ++inference.tokens_to_generate=512 + ++system_message='/no_think' + ``` + +=== "Qwen3-8B" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=Qwen/Qwen3-8B \ + --benchmarks wmt24pp \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=test \ + ++inference.tokens_to_generate=512 + ++prompt_suffix='/no_think' + ``` + +=== "Qwen3-30B-A3B" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=Qwen/Qwen3-30B-A3B \ + --benchmarks wmt24pp \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=test \ + ++inference.tokens_to_generate=512 + ++prompt_suffix='/no_think' + ``` + +=== "gpt-oss-20B" + + ```bash + ns eval \ + --cluster=[cluster] \ + --model=openai/gpt-oss-20b \ + --benchmarks wmt24pp \ + --output_dir=[output dir] \ + --server_type=vllm \ + --server_gpus=8 \ + --split=test \ + ++inference.tokens_to_generate=2048 + ``` diff --git a/docs/index.md b/docs/index.md index ea6a986218..e96ac847b9 100644 --- a/docs/index.md +++ b/docs/index.md @@ -21,7 +21,8 @@ Here are some of the features we support: - [**Instruction following**](./evaluation/instruction-following.md): e.g. [ifbench](./evaluation/instruction-following.md#ifbench), [ifeval](./evaluation/instruction-following.md#ifeval) - [**Long-context**](./evaluation/long-context.md): e.g. [ruler](./evaluation/long-context.md#ruler), [mrcr](./evaluation/long-context.md#mrcr) - [**Tool-calling**](./evaluation/tool-calling.md): e.g. [bfcl_v3](./evaluation/tool-calling.md#bfcl_v3) - - [**Robustness Evaluation**](./evaluation/robustness.md): Evaluate model sensitvity against changes in prompt. + - [**Multilingual capabilities**](./evaluation/multilingual.md): e.g. [mmlu-prox](./evaluation/multilingual.md#mmlu-prox), [flores-200](./evaluation/multilingual.md#FLORES-200), [wmt24pp](./evaluation/multilingual.md#wmt24pp) + - [**Robustness evaluation**](./evaluation/robustness.md): Evaluate model sensitvity against changes in prompt. - Easily parallelize each evaluation across many Slurm jobs, self-host LLM judges, bring your own prompts or change benchmark configuration in any other way. - [Model training](pipelines/training.md): Train models using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner/), [NeMo-RL](https://github.com/NVIDIA/NeMo-RL/) or [verl](https://github.com/volcengine/verl). diff --git a/nemo_skills/dataset/flores200/__init__.py b/nemo_skills/dataset/flores200/__init__.py new file mode 100644 index 0000000000..86a7f76717 --- /dev/null +++ b/nemo_skills/dataset/flores200/__init__.py @@ -0,0 +1,22 @@ +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# settings that define how evaluation should be done by default (all can be changed from cmdline) + +PROMPT_CONFIG = "multilingual/segment-translation" +DATASET_GROUP = "chat" +METRICS_TYPE = "translation" +EVAL_ARGS = "++eval_type=no-op" +GENERATION_ARGS = "" diff --git a/nemo_skills/dataset/flores200/prepare.py b/nemo_skills/dataset/flores200/prepare.py new file mode 100644 index 0000000000..7a427e0f1f --- /dev/null +++ b/nemo_skills/dataset/flores200/prepare.py @@ -0,0 +1,73 @@ +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import json +from pathlib import Path + +from datasets import load_dataset +from langcodes import Language + + +def write_data_to_file(output_file, datasets, src_languages, tgt_languages): + with open(output_file, "wt", encoding="utf-8") as fout: + for src_lang in src_languages: + for tgt_lang in tgt_languages: + if src_lang != tgt_lang: + for src, tgt in zip(datasets[src_lang], datasets[tgt_lang], strict=True): + json_dict = { + "text": src, + "translation": tgt, + "source_language": src_lang, + "target_language": tgt_lang, + "source_lang_name": Language(src_lang).display_name(), + "target_lang_name": Language(tgt_lang).display_name(), + } + json.dump(json_dict, fout) + fout.write("\n") + + +def main(args): + all_languages = list(set(args.source_languages).union(set(args.target_languages))) + + datasets = {} + for lang in all_languages: + iso_639_3 = Language(lang).to_alpha3() + iso_15924 = Language(lang).maximize().script + lang_code = f"{iso_639_3}_{iso_15924}" + datasets[lang] = load_dataset("openlanguagedata/flores_plus", lang_code, split=args.split)["text"] + + data_dir = Path(__file__).absolute().parent + data_dir.mkdir(exist_ok=True) + output_file = data_dir / f"{args.split}.jsonl" + write_data_to_file(output_file, datasets, src_languages=args.source_languages, tgt_languages=args.target_languages) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--split", default="dev", choices=("dev", "devtest"), help="Dataset split to process.") + parser.add_argument( + "--source_languages", + default=["en", "de", "es", "fr", "it", "ja"], + nargs="+", + help="Languages to translate from.", + ) + parser.add_argument( + "--target_languages", + default=["en", "de", "es", "fr", "it", "ja"], + nargs="+", + help="Languages to translate to.", + ) + args = parser.parse_args() + main(args) diff --git a/nemo_skills/dataset/wmt24pp/__init__.py b/nemo_skills/dataset/wmt24pp/__init__.py new file mode 100644 index 0000000000..86a7f76717 --- /dev/null +++ b/nemo_skills/dataset/wmt24pp/__init__.py @@ -0,0 +1,22 @@ +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# settings that define how evaluation should be done by default (all can be changed from cmdline) + +PROMPT_CONFIG = "multilingual/segment-translation" +DATASET_GROUP = "chat" +METRICS_TYPE = "translation" +EVAL_ARGS = "++eval_type=no-op" +GENERATION_ARGS = "" diff --git a/nemo_skills/dataset/wmt24pp/prepare.py b/nemo_skills/dataset/wmt24pp/prepare.py new file mode 100644 index 0000000000..c97ee351c0 --- /dev/null +++ b/nemo_skills/dataset/wmt24pp/prepare.py @@ -0,0 +1,60 @@ +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import json +from pathlib import Path + +from datasets import load_dataset +from langcodes import Language + + +def write_data_to_file(output_file, datasets, tgt_languages): + with open(output_file, "wt", encoding="utf-8") as fout: + for tgt_lang in tgt_languages: + for src, tgt in zip(datasets[tgt_lang]["source"], datasets[tgt_lang]["target"], strict=True): + json_dict = { + "text": src, + "translation": tgt, + "source_language": "en", + "target_language": tgt_lang, + "source_lang_name": "English", + "target_lang_name": Language(tgt_lang[:2]).display_name(), + } + json.dump(json_dict, fout) + fout.write("\n") + + +def main(args): + datasets = {} + for lang in args.target_languages: + datasets[lang] = load_dataset("google/wmt24pp", f"en-{lang}")["train"] + + data_dir = Path(__file__).absolute().parent + data_dir.mkdir(exist_ok=True) + output_file = data_dir / f"{args.split}.jsonl" + write_data_to_file(output_file, datasets, tgt_languages=args.target_languages) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--split", default="test", choices=("test",), help="Dataset split to process.") + parser.add_argument( + "--target_languages", + default=["de_DE", "es_MX", "fr_FR", "it_IT", "ja_JP"], + nargs="+", + help="Languages to translate to.", + ) + args = parser.parse_args() + main(args) diff --git a/nemo_skills/evaluation/metrics/map_metrics.py b/nemo_skills/evaluation/metrics/map_metrics.py index 90f3feebea..8eea47930e 100644 --- a/nemo_skills/evaluation/metrics/map_metrics.py +++ b/nemo_skills/evaluation/metrics/map_metrics.py @@ -33,6 +33,7 @@ from nemo_skills.evaluation.metrics.mrcr_metrics import MRCRMetrics from nemo_skills.evaluation.metrics.ruler_metrics import RulerMetrics from nemo_skills.evaluation.metrics.simpleqa_metrics import SimpleQAMetrics +from nemo_skills.evaluation.metrics.translation_metrics import TranslationMetrics METRICS_MAP = { "math": MathMetrics, @@ -56,6 +57,7 @@ "aalcr": AALCRMetrics, "livebench_coding": LiveCodeBenchMetrics, "ojbench": OJBenchMetrics, + "translation": TranslationMetrics, "human_eval_infilling": HumanEvalInfillingMetrics, } diff --git a/nemo_skills/evaluation/metrics/translation_metrics.py b/nemo_skills/evaluation/metrics/translation_metrics.py new file mode 100644 index 0000000000..dbeadbede8 --- /dev/null +++ b/nemo_skills/evaluation/metrics/translation_metrics.py @@ -0,0 +1,81 @@ +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from collections import defaultdict + +from sacrebleu import corpus_bleu + +from nemo_skills.evaluation.metrics.base import BaseMetrics, as_float + + +class TranslationMetrics(BaseMetrics): + # TODO: refactor BLEU computation so it reuses parent method functions from pass@k + # TODO: add support for other translation metrics, such as COMET and MetricX + + def get_metrics(self): + metrics_dict = {} + for key in self.translation_dict: + src_lang, tgt_lang = key.split("->") + preds = self.translation_dict[key]["preds"] + gts = self.translation_dict[key]["gts"] + + tokenize = "13a" + if tgt_lang[:2] == "ja": + tokenize = "ja-mecab" + if tgt_lang[:2] == "zh": + tokenize = "zh" + if tgt_lang[:2] == "ko": + tokenize = "ko-mecab" + + bleu_score = corpus_bleu(preds, [gts], tokenize=tokenize).score + metrics_dict[key] = {"bleu": bleu_score} + self.aggregation_dict["xx->xx"].append(bleu_score) + self.aggregation_dict[f"{src_lang}->xx"].append(bleu_score) + self.aggregation_dict[f"xx->{tgt_lang}"].append(bleu_score) + + for key in self.aggregation_dict: + metrics_dict[key] = {"bleu": sum(self.aggregation_dict[key]) / len(self.aggregation_dict[key])} + + return metrics_dict + + def update(self, predictions): + """Updating the evaluation results with the current element. + + Args: + predictions (list[dict]): aggregated predictions across all generations. + The content of the file is benchmark specific. + """ + super().update(predictions) + + for pred in predictions: + src_lang = pred["source_language"] + tgt_lang = pred["target_language"] + generation = pred["generation"] + ground_truth = pred["translation"] + + self.translation_dict[f"{src_lang}->{tgt_lang}"]["preds"].append(generation) + self.translation_dict[f"{src_lang}->{tgt_lang}"]["gts"].append(ground_truth) + + def reset(self): + super().reset() + self.translation_dict = defaultdict(lambda: defaultdict(list)) + self.aggregation_dict = defaultdict(list) + + def evaluations_to_print(self): + """Returns all translation pairs and aggregated multilingual dictionaries.""" + return list(self.translation_dict.keys()) + list(self.aggregation_dict.keys()) + + def metrics_to_print(self): + metrics_to_print = {"bleu": as_float} + return metrics_to_print diff --git a/nemo_skills/prompt/config/multilingual/__init__.py b/nemo_skills/prompt/config/multilingual/__init__.py new file mode 100644 index 0000000000..341a77c5bc --- /dev/null +++ b/nemo_skills/prompt/config/multilingual/__init__.py @@ -0,0 +1,13 @@ +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemo_skills/prompt/config/multilingual/segment-translation.yaml b/nemo_skills/prompt/config/multilingual/segment-translation.yaml new file mode 100644 index 0000000000..57e1bdc750 --- /dev/null +++ b/nemo_skills/prompt/config/multilingual/segment-translation.yaml @@ -0,0 +1,3 @@ +# Default prompt for text translation. + +user: "Translate the following segment into {target_lang_name}, without additional explanation.\n\n{text}" diff --git a/requirements/main.txt b/requirements/main.txt index a623f88fa6..27fd526960 100644 --- a/requirements/main.txt +++ b/requirements/main.txt @@ -26,6 +26,7 @@ huggingface_hub hydra-core ipython iso639-lang +langcodes litellm[caching] < 1.75.0 # some bug with asyncio.run hanging forever math-verify[antlr4_9_3] mcp @@ -35,6 +36,7 @@ openai pyyaml rank_bm25 requests +sacrebleu scikit-learn sdp @ git+https://github.com/NVIDIA/NeMo-speech-data-processor@29b9b1ec0ceaf3ffa441c1d01297371b3f8e11d2 sympy