Conversation
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
… into feat/update_ioi
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
📝 WalkthroughWalkthroughThis pull request refactors the IOI evaluation system to support input-file-driven testing and local-based precompilation. Changes include: enabling shared runtime directory mounting in CI, updating IOI dataset preparation with suffix parameters, removing IOI25 dataset configuration, modifying evaluator workflows to use local file-based compilation, enhancing metrics with clustering and per-subtask scoring, and updating documentation accordingly. Changes
Sequence DiagramsequenceDiagram
participant User
participant IOIEvaluator
participant LocalFS as Local FS<br/>(/nemo_run)
participant Sandbox
participant IOIMetrics
User->>IOIEvaluator: eval_full(input_files)
IOIEvaluator->>IOIEvaluator: Load input_data from file
IOIEvaluator->>IOIEvaluator: Initialize runtime
loop For each input_file
IOIEvaluator->>IOIEvaluator: run_input_case()
IOIEvaluator->>LocalFS: Create unique run directory
IOIEvaluator->>LocalFS: Copy precompiled grader/artifacts
IOIEvaluator->>LocalFS: Write contestant solution
IOIEvaluator->>LocalFS: Write test inputs
IOIEvaluator->>Sandbox: Execute compile.sh
Sandbox-->>IOIEvaluator: Compilation result (compile_output, errors)
IOIEvaluator->>Sandbox: Execute run.sh with inputs
Sandbox-->>IOIEvaluator: Run result (stdout, stderr, exit code)
IOIEvaluator->>IOIEvaluator: Aggregate into input_case_results
end
IOIEvaluator->>IOIMetrics: Pass submissions with results
IOIMetrics->>IOIMetrics: extract_info() per submission
IOIMetrics->>IOIMetrics: get_clusters() group by stdout
IOIMetrics->>LocalFS: Write cluster JSONL
IOIMetrics->>IOIMetrics: get_problem_score() per-subtask
IOIMetrics-->>User: Return metrics & per-subtask scoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Key areas requiring focused attention:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/evaluation/code.md (1)
202-220: Fix IOI eval command formatting and add fenced languagesTwo small issues here:
- The line with
\ # set the folder...isn’t a valid shell line continuation if copy‑pasted.- Both this command block and the metrics example block are missing language annotations (MD040).
Suggested adjustment:
-``` -ns eval \ +```bash +ns eval \ @@ - --output_dir=<OUTPUT_DIR> \ - --eval_subfolder=eval-results/ioi24/ \ # set the folder if you want to differentiate subsets. + --output_dir=<OUTPUT_DIR> \ + --eval_subfolder=eval-results/ioi24/ \ + # set the folder if you want to differentiate subsets. --extra_eval_args="++eval_config.test_file=<PATH_TO_METADATA_TEST_DIR>/ioi24_metadata.json" \ @@ -``` +```And for the metrics table:
-``` +```text ---------------------------------- ioi ----------------------------------- @@ -Sphinx's Riddle | 235 | 48 | 235.00 -``` +Sphinx's Riddle | 235 | 48 | 235.00 +```This keeps the examples copy‑pasteable and satisfies the linter.
Also applies to: 224-236
nemo_skills/evaluation/evaluator/icpc.py (1)
456-460: eval_full assumes eval_status that _evaluate_entry doesn’t provideHere you assign:
for s, o in zip(all_samples, outputs): s["test_case_results"] = o["test_case_results"] s["input_case_results"] = o["input_case_results"] s["eval_status"] = o["eval_status"]But
_evaluate_entrycurrently returns onlyname,test_case_results, andinput_case_results, soo["eval_status"]will raise aKeyErroras soon aseval_fullis used.If you don’t rely on
eval_statusdownstream, a minimal fix is:- for s, o in zip(all_samples, outputs): - s["test_case_results"] = o["test_case_results"] - s["input_case_results"] = o["input_case_results"] - s["eval_status"] = o["eval_status"] + for s, o in zip(all_samples, outputs): + s["test_case_results"] = o["test_case_results"] + s["input_case_results"] = o["input_case_results"] + if "eval_status" in o: + s["eval_status"] = o["eval_status"]Alternatively, you could add an
eval_statusfield to_evaluate_entry’s return and keep this assignment. As written, this is a runtime error path.
🧹 Nitpick comments (6)
nemo_skills/dataset/ioi/prepare.py (1)
30-31: Suffix-based IOI filenames are consistent; consider factoring the base nameThe new
--suffixargument andioi{args.suffix}*.{jsonl,json}filenames line up with the IOI24 docs and downstream evaluator expectations.If you expect to vary the suffix (e.g., IOI25), you might slightly simplify by factoring the base once:
- args = parser.parse_args() + args = parser.parse_args() + split_name = f"ioi{args.suffix}" ... - with open(os.path.join(data_dir, f"ioi{args.suffix}.jsonl"), "w") as f: + with open(os.path.join(data_dir, f"{split_name}.jsonl"), "w") as f: ... - with open(os.path.join(data_dir, f"ioi{args.suffix}_metadata.json"), "w") as f: + with open(os.path.join(data_dir, f"{split_name}_metadata.json"), "w") as f:Purely optional; current code is otherwise fine.
Also applies to: 54-87
docs/evaluation/code.md (1)
188-192: Add a language to the IOI prepare_data code blockTo satisfy markdownlint (MD040) and improve syntax highlighting, it’s better to tag this block as shell:
-``` -ns prepare_data ioi -``` +```bash +ns prepare_data ioi +```The surrounding text about generating
ioi24.jsonl/ioi24_metadata.jsonis consistent with the new--suffixdefault inprepare.py.nemo_skills/evaluation/metrics/ioi_metrics.py (3)
29-34: Preserve BaseMetrics kwargs and avoid unconditional printing in IOIMetrics.init
__init__(self, **kwargs)currently drops any arguments meant forBaseMetrics(e.g.,max_k) becausesuper().__init__()is called without them, and it always prints the cluster folder.Consider something like:
- def __init__(self, **kwargs): - super().__init__() - self.reset() - self.cluster_folder = kwargs.get("cluster_folder", None) - print(f"Cluster folder: {self.cluster_folder}") + def __init__(self, **kwargs): + self.cluster_folder = kwargs.pop("cluster_folder", None) + super().__init__(**kwargs) + self.reset()and, if you still want visibility, log
cluster_foldervia your logging setup instead of
53-90: Clustering logic looks good; consider using a clearer identifier and extension for cluster filesThe grouping by
run_stdoutplus per‑subtaskmax_score/max_score_solutionsupdates looks sound.Two minor nits around the output:
idis taken fromsubmission.get("id", id)on each iteration, so the final value is effectively the last non‑missing id. If problems can have multiple ids or some submissions lack it, the filename"{id}_cluster.jsonl"may be surprising or collide across problems.- You’re writing a single JSON object with
json.dump, but the extension is.jsonl.If useful, you could instead do:
- clusters, id = self.get_clusters(submissions) + clusters, last_id = self.get_clusters(submissions) @@ - output_file = os.path.join(self.cluster_folder, f"{id}_cluster.jsonl") + safe_name = name.replace("/", "_") + output_file = os.path.join(self.cluster_folder, f"{safe_name}_{last_id}_clusters.json")This is optional polish; current behavior is functionally fine.
Also applies to: 107-130
92-106: Align get_problem_score signature/docs with its tuple return and consider total_score casting
get_problem_scorenow returns(score, subtasks)but is annotated and documented as returning just a float:def get_problem_score(self, submissions) -> float: ... return sum(subtask_scores.values()), subtask_scoresSince callers rely on unpacking
(score, subtasks), it’d be clearer to update both the return annotation and docstring accordingly, e.g.:- def get_problem_score(self, submissions) -> float: + def get_problem_score(self, submissions) -> tuple[float, dict]: @@ - For a given problem (list of submissions), compute the score as follows: + For a given problem (list of submissions), return: + - total score: sum of per‑subtask maxima across submissions + - per‑subtask score dictAlso,
m["total_score"] = int(total_score)will truncate if any subtask scores are fractional. If IOI scoring guarantees integer totals this is fine; otherwise, you may want to preserve the float.The new
per_problem_subtask_scoresandprint_problem_scoreswiring looks consistent with the evaluator output shape.Also applies to: 135-156, 164-173
nemo_skills/evaluation/evaluator/ioi.py (1)
202-269: IOI input_runtime flow is solid; verify a few assumptions (unroll_files, input_file format, worker count)Overall the input‑driven path looks good:
run_input_caseusesrun_filesto reconstruct the toolchain, you loadinput_fileonce in_initialize_runtime, andeval_fullleveragesunroll_filesto handle globs.A few details worth double‑checking / possibly tightening:
run_filesexpectations (lines 202‑247):run_input_caseassumes thatrun_filescontains scripts whose basenames arecompileandrun, and then hard‑codes./compileand./run < input.txt>. If future metadata ships scripts namedcompile.sh/run.shor in subdirectories, this will break. If there’s variability here, consider:
- using the exact filenames from
run_filesinstead of hard‑codedcompile/run, or- normalizing them at data‑prep time.
Exception handling in run_input_case (lines 262‑269): catching
Exceptionand returning only strings is convenient but can make debugging harder. Even a briefexceptblock would help trace failures without changing the external schema.Pool sizing vs. config (lines 347‑350):
multiprocessing.Poolusesprocesses=self.eval_cfg.test_batch_size; thenum_workersconfig field is currently unused. If you intendnum_workersto control process count independently of batch size, you may want:processes = self.eval_cfg.num_workers or self.eval_cfg.test_batch_size pool_local = multiprocessing.Pool(processes=processes, initializer=init_worker)unroll_files import (lines 27‑28, 476‑489): here
unroll_filesis imported fromnemo_skills.utils, while the provided snippet shows its definition innemo_skills/file_utils.py. Please confirm thatnemo_skills.utilsre‑exportsunroll_files; otherwise this import will fail at runtime and should be switched tofrom nemo_skills.file_utils import jdump, unroll_files.input_file structure (lines 339‑347, 445‑467): you index
self.inputdatawithstr(entry["id"]). That’s fine as long as the input JSON is keyed by stringified dataset ids; worth ensuring the generator for that file matches this convention.None of these are blockers, but they’re good to validate before relying on this path heavily.
Also applies to: 318-355, 476-489, 27-28
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/tests.yml(1 hunks)docs/evaluation/code.md(2 hunks)nemo_skills/dataset/ioi/prepare.py(3 hunks)nemo_skills/dataset/ioi25/__init__.py(0 hunks)nemo_skills/evaluation/evaluator/icpc.py(1 hunks)nemo_skills/evaluation/evaluator/ioi.py(8 hunks)nemo_skills/evaluation/metrics/ioi_metrics.py(2 hunks)
💤 Files with no reviewable changes (1)
- nemo_skills/dataset/ioi25/init.py
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/evaluation/evaluator/ioi.py (2)
nemo_skills/file_utils.py (1)
unroll_files(21-32)nemo_skills/utils.py (1)
nested_dataclass(69-102)
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
190-190: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
226-226: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.7)
nemo_skills/evaluation/evaluator/ioi.py
114-114: os.chmod setting a permissive mask 0o755 on file or directory
(S103)
119-119: os.chmod setting a permissive mask 0o755 on file or directory
(S103)
198-199: try-except-pass detected, consider logging the exception
(S110)
198-198: Do not catch blind exception: Exception
(BLE001)
216-216: os.chmod setting a permissive mask 0o755 on file or directory
(S103)
260-260: Consider moving this statement to an else block
(TRY300)
262-262: Do not catch blind exception: Exception
(BLE001)
341-344: Avoid specifying long messages outside the exception class
(TRY003)
462-462: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
484-484: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (3)
.github/workflows/tests.yml (1)
88-91: CI /nemo_run setup and mount look appropriateCreating
/nemo_runand mounting it intonemo-skills-sandbox-imagematches the new IOI/ICPC evaluators’ use of that shared filesystem. In the ephemeral GitHub runner context,chmod 777here is acceptable and there’s no additional cleanup needed.nemo_skills/evaluation/evaluator/icpc.py (1)
239-251: Using raw run_stdout for input cases is reasonableSwitching
run_stdoutfrom a hash to the raw stdout improves debuggability and supports clustering/grouping on actual outputs. Withmax_output_characters=1_000_000the per‑run size is bounded; just be aware that JSONL files for large input sets will grow accordingly, which seems acceptable given the use case.nemo_skills/evaluation/evaluator/ioi.py (1)
87-124: /nemo_run‑based precompile and run directories for IOI look consistentThe new
_precompile_graderandrun_test_casepaths that write under/nemo_run/ioi_pre_*and/nemo_run/ioi_run_*align with the CI changes that mount/nemo_runinto the sandbox container. Usingos.getpid()andtime.time_ns()in the directory names should avoid collisions across workers, and copying precompiled assets per run keeps the checker/grader reuse clear.No blocking issues here; the structure mirrors the ICPC workflow in a reasonable way.
Also applies to: 127-135, 129-135
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
… into feat/update_ioi
This reverts commit 9985b2e.
This reverts commit 9985b2e. Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: dlord <dlord@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Updates IOI with fixes from ICPC.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.