Conversation
WalkthroughReplaced the ioi24 subsection with a single IOI section documenting IOI24/IOI25: dataset preparation via Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant NS as ns CLI
participant DS as Dataset Store
participant SL as Slurm Scheduler
participant EV as Evaluator
participant RS as Results/Logs
rect rgb(235,245,255)
note over U,NS: Data preparation (IOI24/IOI25)
U->>NS: ns prepare_data --benchmark ioi24|ioi25 ...
NS->>DS: fetch & prepare IOI artifacts
DS-->>NS: prepared dataset path
NS-->>U: prints prepared-data path
end
rect rgb(240,255,240)
note over U,NS: Evaluation (multi-solution)
U->>NS: ns eval --benchmark ioi24|ioi25 --num-solutions-per-subtask 50 --slurm|--local ...
alt Slurm
NS->>SL: submit evaluation jobs
SL->>EV: start evaluator tasks
else Local
NS->>EV: run evaluator locally
end
EV->>DS: load prepared data
EV->>EV: generate N solutions per subtask
EV->>RS: write metrics, logs, artifacts
RS-->>U: results path for verification
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/evaluation/code.md (2)
378-405: Add shell language hints to command snippets.Please tag these fenced blocks as
bash(orshell) so rendered docs get syntax highlighting and downstream linters stop flagging them.
371-372: Use descriptive link text.Replace bare “here” with something like “IOI24 dataset on HuggingFace” to satisfy MD059 and improve accessibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
367-367: Link text should be descriptive
(MD059, descriptive-link-text)
371-371: Link text should be descriptive
(MD059, descriptive-link-text)
377-377: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
389-389: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/evaluation/code.md (2)
371-372: Use descriptive link text for accessibility.Replace the bare “here” link text with something meaningful like “IOI24 dataset on Hugging Face” so screen readers convey the destination. Based on static analysis hints
377-405: Annotate shell snippets with their language.Please add a language like
bashto the fenced code blocks so tooling and syntax highlighting work correctly. Based on static analysis hints
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
367-367: Link text should be descriptive
(MD059, descriptive-link-text)
371-371: Link text should be descriptive
(MD059, descriptive-link-text)
377-377: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
389-389: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/evaluation/code.md (3)
371-372: Use descriptive link text for accessibility.Replace “here” with meaningful link text so screen readers convey where the URL leads.
-- Original benchmark source is [here](https://huggingface.co/datasets/open-r1/ioi). +- Original benchmark source is the [Open-R1 IOI dataset on Hugging Face](https://huggingface.co/datasets/open-r1/ioi).
377-379: Add a language identifier to the CLI code fence.Specify the shell language for proper syntax highlighting and lint compliance.
-``` +```bash ns prepare_data ioi24--- `389-405`: **Add the shell language to the eval command fence.** Mark the fence as bash to improve readability and satisfy markdown linting. ```diff -``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \ --model=nvidia/OpenReasoning-Nemotron-32B \ --server_type=vllm \ --server_args="--async-scheduling" \ --server_nodes=1 \ --server_gpus=8 \ --benchmarks=ioi24:50 \ --with_sandbox \ --split=test \ --data_dir=<DATA_DIR> \ --output_dir=<OUTPUT_DIR> \ --extra_eval_args="++eval_config.test_file=<PATH_TO_METADATA_TEST_FILE>" \ ++inference.temperature=0.6 \ ++inference.top_p=0.95 \ ++inference.tokens_to_generate=65536</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 8be70d991c74f44e1235ca5a96891df490dba36b and 04a6eede4e33a4f7291fb4d0c3f1ed092326d07d. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `docs/evaluation/code.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>docs/evaluation/code.md</summary> 367-367: Link text should be descriptive (MD059, descriptive-link-text) --- 371-371: Link text should be descriptive (MD059, descriptive-link-text) --- 377-377: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 389-389: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary> * GitHub Check: pre-commit * GitHub Check: unit-tests </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
docs/evaluation/code.md (4)
181-184: Add code block language specifiers and use descriptive link text.The IOI section has markdown linting violations. The code block starting at line 189 is missing a language specifier, and link text at line 183 should be descriptive rather than "here".
Apply this diff:
### IOI -We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi). +We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [in the open-r1/ioi dataset](https://huggingface.co/datasets/open-r1/ioi).
189-191: Specify code block language.The fenced code block should declare a language for syntax highlighting.
Apply this diff:
First, prepare the dataset by running the `ns prepare_data` command. The arguments below will generate `test.jsonl` and `test_metadata.json`. -``` +```bash ns prepare_data ioi24 -``` +```
201-218: Specify code block language for the evaluation command.The fenced code block should declare a language for syntax highlighting.
Apply this diff:
This command runs an evaluation of [OpenReasoning-Nemotron-32B](https://huggingface.co/nvidia/OpenReasoning-Nemotron-32B) on a Slurm cluster. -``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \
224-229: Specify code block language for the results output.The fenced code block showing expected output should declare a language for consistency.
Apply this diff:
After all jobs are complete, you can check the results in `<OUTPUT_DIR>/eval-results/ioi24/metrics.json`. You can also take a look at `<OUTPUT_DIR>/eval-results/ioi24/summarized-results/main_*`. They should look something like this: -``` +```text ------------------------------------------------------ ioi24 ------------------------------------------------------
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
183-183: Link text should be descriptive
(MD059, descriptive-link-text)
189-189: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
201-201: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
224-224: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
414-414: Link text should be descriptive
(MD059, descriptive-link-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (1)
docs/evaluation/code.md (1)
411-414: Verify intentional removal of benchmark definition line in human-eval-infilling.The human-eval-infilling section now differs structurally from other benchmarks: the "Benchmark is defined in..." line was removed, leaving only the original benchmark source link. Confirm this removal was intentional and aligns with documentation standards.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
docs/evaluation/code.md (4)
180-190: Add language specifier to code block and use descriptive link text.Per markdown best practices, code blocks should specify the language and links should use descriptive text instead of "here".
-We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi). +We currently support IOI24 and are working to support IOI25 for evaluation. The original data for IOI24 can be seen in the [open-r1/ioi dataset](https://huggingface.co/datasets/open-r1/ioi). #### Data Preparation First, prepare the dataset by running the `ns prepare_data` command. The arguments below will generate `test.jsonl` and `test_metadata.json`. -``` +```bash ns prepare_data ioi24--- `200-217`: **Add language specifier to ns eval command block.** Specify `bash` as the language for the code block to improve formatting and readability. ```diff -``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \
223-228: Add language specifier to results output block.Specify the language for the code block to maintain consistency with other sections.
-``` +``` ------------------------------------------------------ ioi24 ------------------------------------------------------ evaluation_mode | num_entries | avg_tokens | gen_seconds | correct | total_score | round_robin_score
410-413: Use descriptive link text instead of "here".Replace the generic "here" with text that describes the link destination.
- -- Original benchmark source is [here](https://github.com/openai/human-eval-infilling). + +- Original benchmark source is at [openai/human-eval-infilling](https://github.com/openai/human-eval-infilling).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
182-182: Link text should be descriptive
(MD059, descriptive-link-text)
188-188: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
200-200: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
223-223: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
413-413: Link text should be descriptive
(MD059, descriptive-link-text)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (1)
docs/evaluation/code.md (1)
180-229: Excellent IOI documentation additions.The new IOI section is well-structured and comprehensive, covering data preparation, evaluation with multi-solution settings, and results verification. It directly addresses previous feedback about including expected output examples. The typo fix (METADATA_TEST_FILE) from the earlier review has been properly applied. The section follows established patterns from other benchmarks and provides clear, actionable instructions for users.
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sean Naren <sean.narenthiran@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
0c447fa to
ca6d3d8
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
docs/evaluation/code.md (5)
182-182: Make link text descriptive.Use descriptive, meaningful text that clearly indicates the link destination. Avoid generic phrases that provide no context about where the link leads. Replace the generic "here" with text that describes the link target, e.g., "IOI24 dataset on Hugging Face" or similar.
-The original data for IOI24 can be seen [here](https://huggingface.co/datasets/open-r1/ioi). +The original data for IOI24 can be seen in the [open-r1 IOI dataset](https://huggingface.co/datasets/open-r1/ioi).
188-190: Add language identifier to code block.Specify
bashas the language for syntax highlighting.-``` +```bash ns prepare_data ioi24 -``` +```
200-217: Add language identifier to code block.Specify
bashas the language for syntax highlighting.-``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \ --model=nvidia/OpenReasoning-Nemotron-32B \ -``` +```
223-228: Add language identifier to code block.Specify a language (e.g.,
textorplain) for the example output block.-``` +```text ------------------------------------------------------ ioi24 ------------------------------------------------------ evaluation_mode | num_entries | avg_tokens | gen_seconds | correct | total_score | round_robin_score pass@1[avg-of-50] | 39 | 40387 | 7410 | 0.51% ± 1.04% | 303.47 | 261.01 pass@50 | 39 | 40387 | 7410 | 2.56% | 303.47 | 261.01 -``` +```
413-413: Make link text descriptive.Use descriptive, meaningful text that clearly indicates the link destination. Avoid generic phrases that provide no context about where the link leads. Replace the generic "here" with text describing the benchmark source, e.g., "human-eval-infilling repository" or similar.
-- Original benchmark source is [here](https://github.com/openai/human-eval-infilling). +- Original benchmark source is the [human-eval-infilling repository](https://github.com/openai/human-eval-infilling).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
182-182: Link text should be descriptive
(MD059, descriptive-link-text)
188-188: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
200-200: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
223-223: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
413-413: Link text should be descriptive
(MD059, descriptive-link-text)
🔇 Additional comments (1)
docs/evaluation/code.md (1)
180-229: Comprehensive IOI documentation with clear workflow.The section provides a complete walkthrough: data preparation, evaluation execution with a realistic example, and result verification including expected output. This addresses prior feedback effectively and gives users clear guidance on IOI24/IOI25 evaluation.
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Sean Naren <sean.narenthiran@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary by CodeRabbit