Adding long context benchmark MRCR by fayejf · Pull Request #634 · NVIDIA-NeMo/Skills

fayejf · 2025-08-01T05:04:49Z

OpenAI MRCR (Multi-round co-reference resolution): Long context multiple needle in a haystack benchmark

Prepare data
By dafault it prepares all 2400 samples up to 1M tokens.

ns prepare_data \
    --data_dir=/workspace/ns-data \
    --cluster=fei-ord \
    mrcr

Or you can prepare subset.

ns prepare_data \
    --data_dir=/workspace/ns-data \
    --cluster=fei-ord \
    mrcr --max_context_window 131072 --needles_subset 2 --setup needle2_128k

Run evaluation
Specific eval split or use what saved in __init__.py (default is all)

model=Meta-Llama-3.1-8B-Instruct
split=needle2_64k
ns eval \
    --cluster=fei-ord \
    --data_dir=/workspace/ns-data \
    --server_type=vllm \
    --model=/hf_models/$model \
    --server_gpus=8 \
    --benchmarks=mrcr:0 \
    --split=$split \
    --output_dir=/workspace/results/mrcr/split/$model

Signed-off-by: fayejf <fayejf07@gmail.com>

Kipok

Thanks! Just a few small changes are needed

Kipok · 2025-08-01T22:58:12Z

nemo_skills/dataset/mrcr/prepare.py

+from tqdm import tqdm
+import tempfile
+
+subprocess.run(["pip install tiktoken"], check=True, shell=True)


please move it inside the function where it's needed. Otherwise this is going to run on every import even when the script isn't called

Good point! changed.

nemo_skills/dataset/mrcr/prepare.py

Kipok · 2025-08-01T23:02:27Z

nemo_skills/dataset/mrcr/prepare.py

+    output_file = data_dir / f"{setup}.jsonl"
+
+    with open(data_dir / "__init__.py", "w", encoding="utf-8") as init_file:
+        init_file.write(f"EVAL_SPLIT = '{setup}'\n")


best not to override init dynamically here. Users can always provide --split argument to change this, so no need to change defaults

Got it. Changed!

tests/test_datasets.py

revert test Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: Igor Gitman <igitman@nvidia.com>

nemo_skills/dataset/mrcr/__init__.py

Kipok · 2025-08-06T03:40:04Z

Or is it supposed to use that messages list directly (so complete the last turn of a large multi-turn generation)? In that case you should put it as a list in "messages" key and then set ++prompt_format=openai in generation args

Signed-off-by: fayejf <fayejf07@gmail.com>

fayejf · 2025-08-06T23:26:04Z

Or is it supposed to use that messages list directly (so complete the last turn of a large multi-turn generation)? In that case you should put it as a list in "messages" key and then set ++prompt_format=openai in generation args

Wait do we support that? I wanted it but I didn't know.
I think people follow this way complete the last turn of a large multi-turn generation. But does this support for all models in nemo-skills?

messages = json.loads(row["prompt"])
completion = client.chat.completions.create(
    model=MODEL,
    messages=messages,
)
response = completion.choices[0].message.content

Kipok · 2025-08-06T23:43:13Z

yes, that should be supported with the parameters I shared

Signed-off-by: fayejf <fayejf07@gmail.com>

nemo_skills/dataset/mrcr/prepare.py

Signed-off-by: fayejf <fayejf07@gmail.com>

Kipok

Looks great, thanks!

Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>

Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

fayejf and others added 7 commits July 31, 2025 10:24

prepare mrcr data

3b9aaaf

Signed-off-by: fayejf <fayejf07@gmail.com>

init loc

58fca84

Signed-off-by: fayejf <fayejf07@gmail.com>

eval

c886242

Signed-off-by: fayejf <fayejf07@gmail.com>

change default max

ad8f095

Signed-off-by: fayejf <fayejf07@gmail.com>

test_datasets

a6a6231

Signed-off-by: fayejf <fayejf07@gmail.com>

readme

94732ca

Signed-off-by: fayejf <fayejf07@gmail.com>

Merge branch 'main' into fayejf/mrcr

94e93bd

fayejf requested a review from Kipok August 1, 2025 05:25

Kipok reviewed Aug 1, 2025

View reviewed changes

fayejf and others added 7 commits August 4, 2025 16:31

Update tests/test_datasets.py

a4e0c67

revert test Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Update nemo_skills/dataset/mrcr/prepare.py

e6d5437

Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Update nemo_skills/dataset/mrcr/prepare.py

5746b33

Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

Update nemo_skills/dataset/mrcr/prepare.py

d673cab

Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>

move install and import tiktoken

f1301a0

Signed-off-by: fayejf <fayejf07@gmail.com>

revert dynamically override __init__.py. leave instruction

4c36e45

Signed-off-by: fayejf <fayejf07@gmail.com>

Merge branch 'main' into fayejf/mrcr

c6f0c46

fayejf requested a review from Kipok August 5, 2025 16:22

Fix import error

08c77af

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Kipok reviewed Aug 6, 2025

View reviewed changes

nemo_skills/dataset/mrcr/__init__.py Show resolved Hide resolved

fayejf and others added 2 commits August 6, 2025 16:06

Merge branch 'main' into fayejf/mrcr

cbf3ae9

fix init

15ee906

Signed-off-by: fayejf <fayejf07@gmail.com>

fayejf and others added 4 commits August 8, 2025 14:13

update prompt_format to openai

db200e8

Signed-off-by: fayejf <fayejf07@gmail.com>

PROMPT CONFIG None

176277c

Signed-off-by: fayejf <fayejf07@gmail.com>

Merge branch 'main' into fayejf/mrcr

b5c993f

fix

2ee1a57

Signed-off-by: fayejf <fayejf07@gmail.com>

Kipok reviewed Aug 11, 2025

View reviewed changes

nemo_skills/dataset/mrcr/prepare.py Outdated Show resolved Hide resolved

remove convert string

18027bd

Signed-off-by: fayejf <fayejf07@gmail.com>

Kipok approved these changes Aug 11, 2025

View reviewed changes

Kipok merged commit 2c84e05 into main Aug 11, 2025
4 checks passed

fayejf deleted the fayejf/mrcr branch August 11, 2025 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding long context benchmark MRCR#634

Adding long context benchmark MRCR#634
Kipok merged 22 commits intomainfrom
fayejf/mrcr

fayejf commented Aug 1, 2025 •

edited

Loading

Uh oh!

Kipok left a comment

Uh oh!

Kipok Aug 1, 2025

Uh oh!

fayejf Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kipok Aug 1, 2025

Uh oh!

fayejf Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

Kipok commented Aug 6, 2025

Uh oh!

fayejf commented Aug 6, 2025

Uh oh!

Kipok commented Aug 6, 2025

Uh oh!

Uh oh!

Kipok left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fayejf commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

fayejf Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kipok Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

fayejf Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kipok commented Aug 6, 2025

Uh oh!

fayejf commented Aug 6, 2025

Uh oh!

Kipok commented Aug 6, 2025

Uh oh!

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fayejf commented Aug 1, 2025 •

edited

Loading