Skip to content

Commit c8aad08

Browse files
committed
lint; fix doc; revert pytest skip
Signed-off-by: Yuki Huang <[email protected]>
1 parent 60aca3b commit c8aad08

File tree

4 files changed

+16
-10
lines changed

4 files changed

+16
-10
lines changed

docs/guides/eval.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,11 @@ score=0.1000 (3.0/30)
8989

9090
## List of currently supported benchmarks
9191

92-
- [AIME-2024](../../nemo_rl/data/eval_datasets/aime2024.py): the corresponding `data.dataset_name` is `"aime2024"`.
93-
- [AIME-2025](../../nemo_rl/data/eval_datasets/aime2025.py): the corresponding `data.dataset_name` is `"aime2025"`.
94-
- [GPQA and GPQA-diamond](../../nemo_rl/data/eval_datasets/gpqa.py): the corresponding `data.dataset_name` are `"gpqa"` and `"gpqa-diamond"`.
95-
- [MATH and MATH-500](../../nemo_rl/data/eval_datasets/math.py): the corresponding `data.dataset_name` are `"math"` and `"math500"`.
96-
- [MMLU](../../nemo_rl/data/eval_datasets/mmlu.py): this also includes MMMLU (Multilingual MMLU), a total of 14 languages. When `data.dataset_name` is set to `mmlu`, the English version is used. If one wants to run evaluation on another language, `data.dataset_name` should be set to `mmlu_{language}` where `language` is one of following 14 values, `["AR-XY", "BN-BD", "DE-DE", "ES-LA", "FR-FR", "HI-IN", "ID-ID", "IT-IT", "JA-JP", "KO-KR", "PT-BR", "ZH-CN", "SW-KE", "YO-NG"]`.
97-
- [MMLU-Pro](../../nemo_rl/data/eval_datasets/mmlu_pro.py): the corresponding `data.dataset_name` is `"mmlu_pro"`.
98-
99-
More details can be found in [load_eval_dataset](../../nemo_rl/data/eval_datasets/__init__.py).
100-
92+
- [AIME-2024](../../nemo_rl/data/datasets/eval_datasets/aime2024.py): the corresponding `data.dataset_name` is `"aime2024"`.
93+
- [AIME-2025](../../nemo_rl/data/datasets/eval_datasets/aime2025.py): the corresponding `data.dataset_name` is `"aime2025"`.
94+
- [GPQA and GPQA-diamond](../../nemo_rl/data/datasets/eval_datasets/gpqa.py): the corresponding `data.dataset_name` are `"gpqa"` and `"gpqa-diamond"`.
95+
- [MATH and MATH-500](../../nemo_rl/data/datasets/eval_datasets/math.py): the corresponding `data.dataset_name` are `"math"` and `"math500"`.
96+
- [MMLU](../../nemo_rl/data/datasets/eval_datasets/mmlu.py): this also includes MMMLU (Multilingual MMLU), a total of 14 languages. When `data.dataset_name` is set to `mmlu`, the English version is used. If one wants to run evaluation on another language, `data.dataset_name` should be set to `mmlu_{language}` where `language` is one of following 14 values, `["AR-XY", "BN-BD", "DE-DE", "ES-LA", "FR-FR", "HI-IN", "ID-ID", "IT-IT", "JA-JP", "KO-KR", "PT-BR", "ZH-CN", "SW-KE", "YO-NG"]`.
97+
- [MMLU-Pro](../../nemo_rl/data/datasets/eval_datasets/mmlu_pro.py): the corresponding `data.dataset_name` is `"mmlu_pro"`.
98+
99+
More details can be found in [load_eval_dataset](../../nemo_rl/data/datasets/eval_datasets/__init__.py).

examples/run_sft.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,9 @@ def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig, seed: int):
111111
# add preprocessor if needed
112112
datum_preprocessor = None
113113
if data_config["dataset_name"] == "clevr_cogent":
114-
from nemo_rl.data.datasets.response_datasets.clevr import format_clevr_cogent_dataset
114+
from nemo_rl.data.datasets.response_datasets.clevr import (
115+
format_clevr_cogent_dataset,
116+
)
115117

116118
datum_preprocessor = partial(format_clevr_cogent_dataset, return_pil=True)
117119

nemo_rl/data/datasets/response_datasets/local_response_dataset.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ class LocalResponseDataset:
3535
input_key: Key for the input text
3636
output_key: Key for the output text
3737
"""
38+
3839
def __init__(
3940
self,
4041
train_ds_path: str,

tests/unit/data/datasets/test_eval_dataset.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14+
import pytest
1415
from transformers import AutoTokenizer
1516

1617
from nemo_rl.data.datasets.eval_datasets import (
@@ -20,6 +21,7 @@
2021
)
2122

2223

24+
@pytest.mark.skip(reason="dataset download is flaky")
2325
def test_gpqa_dataset():
2426
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
2527
gpqa_dataset = GPQADataset()
@@ -44,6 +46,7 @@ def test_gpqa_dataset():
4446
)
4547

4648

49+
@pytest.mark.skip(reason="dataset download is flaky")
4750
def test_math_dataset():
4851
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
4952
math_dataset = MathDataset()
@@ -67,6 +70,7 @@ def test_math_dataset():
6770
)
6871

6972

73+
@pytest.mark.skip(reason="dataset download is flaky")
7074
def test_mmlu_dataset():
7175
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
7276
mmlu_dataset = MMLUDataset()

0 commit comments

Comments
 (0)