Fix input_file path handling when executor is "none"#1089
Fix input_file path handling when executor is "none"#1089gwarmstrong merged 2 commits intoNVIDIA-NeMo:mainfrom
Conversation
📝 WalkthroughWalkthroughThe change updates the condition in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nemo_skills/pipeline/utils/eval.py (1)
93-101: Executor"none"behavior matches intent; consider separating it from the mounted‑path branchThe added
or cluster_config["executor"] == "none"ensuresinput_fileis built directly fromdata_pathwhen there is no container, so you avoid the/nemo_run/codeprefix and the later"/nemo_run/code" -> "./"replacement. That directly addresses the bug described in the PR.One subtle point: in this branch you still run the same “mounted path” logic:
input_fileisf"{data_path}/.../{split}.jsonl".unmounted_pathis derived viaget_unmounted_path(...)andPath(__file__).parents[3] / ....For absolute
data_paththis is effectively a no-op and behaves as desired. For relativedata_path, though,input_filewill be resolved relative to the current working directory, while the existence check will useunmounted_pathresolved relative toPath(__file__).parents[3](likely the repo root). If users ever pass a relativedata_pathunderexecutor="none", this could be surprising.You might want to make the
"none"case explicit and treat it as purely local (no mount transforms), which also makes the intent clearer:- if not is_on_cluster: - if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none": - input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" - unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file) - unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", "")) - else: - # will be copied over in this case as it must come from extra datasets - input_file = f"/nemo_run/code/{Path(data_path).name}/{benchmark.replace('.', '/')}/{split}.jsonl" - unmounted_path = Path(data_path) / benchmark.replace(".", "/") / f"{split}.jsonl" + if not is_on_cluster: + if cluster_config["executor"] == "none": + # Local, no container: use the provided data_path directly + input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" + unmounted_path = input_file + elif pipeline_utils.is_mounted_filepath(cluster_config, data_path): + input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl" + unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file) + unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", "")) + else: + # will be copied over in this case as it must come from extra datasets + input_file = f"/nemo_run/code/{Path(data_path).name}/{benchmark.replace('.', '/')}/{split}.jsonl" + unmounted_path = Path(data_path) / benchmark.replace(".", "/") / f"{split}.jsonl"This keeps the new behavior while making the
"none"semantics unambiguous and avoiding any dependency on mount-related helpers in the non-container case.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/pipeline/utils/eval.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/pipeline/utils/eval.py (1)
nemo_skills/pipeline/utils/mounts.py (1)
is_mounted_filepath(27-46)
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
c8db1b1 to
779b5a9
Compare
Docstrings generation was requested by @bzantium. * #1089 (comment) The following files were modified: * `nemo_skills/dataset/utils.py` * `nemo_skills/pipeline/utils/eval.py`
b725bcb to
7bc29e7
Compare
…th directly instead of prepending /nemo_run/code/ Signed-off-by: bzantium <ryumin93@gmail.com>
7bc29e7 to
499b9bd
Compare
Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary
Fixes incorrect
input_filepath construction when running evaluation withexecutor="none"(no container execution). Whenexecutor="none", we're already in the actual working environment, so we should use the provideddata_pathdirectly without container-specific path transformations.Problem
When
executor="none", the code incorrectly constructsinput_filepaths by prepending/nemo_run/code/instead of using the provideddata_pathdirectly. This causes:eval.py) that setsinput_file = f"/nemo_run/code/{Path(data_path).name}/..."exp.pyline 673,/nemo_run/codegets replaced with./for non-container execution.//{Path(data_path).name}/...that cause file not found errors during evaluationSolution
Ensure that when
executor="none", the code usesdata_pathdirectly without the/nemo_run/code/prefix. Theinput_fileshould be constructed as:This avoids container-specific path transformations that are not applicable when running without containers.
Changes
get_benchmark_args_from_module()innemo_skills/pipeline/utils/eval.pyto properly handleexecutor="none"caseinput_fileusesdata_pathdirectly whenexecutor="none"instead of constructing container-specific pathsTesting
executor="none"and customdata_pathinput_filepaths are correctly constructed without/nemo_run/code/prefixRelated Issues
Fixes #1088
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.