Fix input_file path handling when executor is "none" by bzantium · Pull Request #1089 · NVIDIA-NeMo/Skills

bzantium · 2025-12-10T03:42:26Z

Summary

Fixes incorrect input_file path construction when running evaluation with executor="none" (no container execution). When executor="none", we're already in the actual working environment, so we should use the provided data_path directly without container-specific path transformations.

Problem

When executor="none", the code incorrectly constructs input_file paths by prepending /nemo_run/code/ instead of using the provided data_path directly. This causes:

Incorrect path construction: The code falls into the else branch (line 98-100 in eval.py) that sets input_file = f"/nemo_run/code/{Path(data_path).name}/..."
Path transformation issue: In exp.py line 673, /nemo_run/code gets replaced with ./ for non-container execution
Malformed paths: This creates malformed paths like .//{Path(data_path).name}/... that cause file not found errors during evaluation

Solution

Ensure that when executor="none", the code uses data_path directly without the /nemo_run/code/ prefix. The input_file should be constructed as:

input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"

This avoids container-specific path transformations that are not applicable when running without containers.

Changes

Modified get_benchmark_args_from_module() in nemo_skills/pipeline/utils/eval.py to properly handle executor="none" case
Ensures input_file uses data_path directly when executor="none" instead of constructing container-specific paths

Testing

Tested evaluation with executor="none" and custom data_path
Verified that input_file paths are correctly constructed without /nemo_run/code/ prefix
Confirmed evaluation runs successfully without file not found errors

Related Issues

Fixes #1088

Summary by CodeRabbit

Bug Fixes
- Improved data path resolution for non-cluster pipeline execution to correctly handle mounted paths when using specific executor configurations, ensuring consistent file access across execution environments.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T03:44:57Z

📝 Walkthrough

Walkthrough

The change updates the condition in get_benchmark_args_from_module to also consider cluster_config["executor"] == "none", ensuring that non-container executions properly treat data paths as mounted paths instead of prepending the container-specific /nemo_run/code/ prefix.

Changes

Cohort / File(s)	Summary
Non-cluster executor condition fix `nemo_skills/pipeline/utils/eval.py`	Updated condition in `get_benchmark_args_from_module` to include `cluster_config["executor"] == "none"` check, ensuring input files use mounted path format for non-container executions instead of container-specific paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Single-file change with a straightforward conditional logic update
Addresses a specific bug fix without introducing new functionality or complex interactions
Change is minimal and self-contained to the affected condition

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix input_file path handling when executor is "none"' directly matches the main change: handling paths when executor is "none", which is the core focus of the changeset.
Linked Issues check	✅ Passed	The code change modifies the condition in get_benchmark_args_from_module() to treat executor="none" as a non-cluster path case [#1088], ensuring input_file uses mounted path format instead of the /nemo_run/code prefix, which directly addresses the bug objectives.
Out of Scope Changes check	✅ Passed	The change is narrowly scoped to the eval.py file, modifying only the path handling condition for executor="none" to fix the path construction issue described in #1088, with no extraneous modifications.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

nemo_skills/pipeline/utils/eval.py (1)
93-101: Executor "none" behavior matches intent; consider separating it from the mounted‑path branch

The added or cluster_config["executor"] == "none" ensures input_file is built directly from data_path when there is no container, so you avoid the /nemo_run/code prefix and the later "/nemo_run/code" -> "./" replacement. That directly addresses the bug described in the PR.

One subtle point: in this branch you still run the same “mounted path” logic:

input_file is f"{data_path}/.../{split}.jsonl".

unmounted_path is derived via get_unmounted_path(...) and Path(__file__).parents[3] / ....

For absolute data_path this is effectively a no-op and behaves as desired. For relative data_path, though, input_file will be resolved relative to the current working directory, while the existence check will use unmounted_path resolved relative to Path(__file__).parents[3] (likely the repo root). If users ever pass a relative data_path under executor="none", this could be surprising.

You might want to make the "none" case explicit and treat it as purely local (no mount transforms), which also makes the intent clearer:
-    if not is_on_cluster:
-        if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none":
-            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
-            unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file)
-            unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", ""))
-        else:
-            # will be copied over in this case as it must come from extra datasets
-            input_file = f"/nemo_run/code/{Path(data_path).name}/{benchmark.replace('.', '/')}/{split}.jsonl"
-            unmounted_path = Path(data_path) / benchmark.replace(".", "/") / f"{split}.jsonl"
+    if not is_on_cluster:
+        if cluster_config["executor"] == "none":
+            # Local, no container: use the provided data_path directly
+            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            unmounted_path = input_file
+        elif pipeline_utils.is_mounted_filepath(cluster_config, data_path):
+            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file)
+            unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", ""))
+        else:
+            # will be copied over in this case as it must come from extra datasets
+            input_file = f"/nemo_run/code/{Path(data_path).name}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            unmounted_path = Path(data_path) / benchmark.replace(".", "/") / f"{split}.jsonl"
This keeps the new behavior while making the "none" semantics unambiguous and avoiding any dependency on mount-related helpers in the non-container case.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 774cef6 and dbf8c31.

📒 Files selected for processing (1)

nemo_skills/pipeline/utils/eval.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

nemo_skills/pipeline/utils/eval.py (1)

nemo_skills/pipeline/utils/mounts.py (1)

is_mounted_filepath (27-46)

bzantium · 2025-12-10T04:30:35Z

@coderabbitai generate docstrings

coderabbitai · 2025-12-10T04:30:40Z

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

coderabbitai · 2025-12-10T04:30:43Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #1090

@bzantium

Docstrings generation was requested by @bzantium. * #1089 (comment) The following files were modified: * `nemo_skills/dataset/utils.py` * `nemo_skills/pipeline/utils/eval.py`

…th directly instead of prepending /nemo_run/code/ Signed-off-by: bzantium <ryumin93@gmail.com>

gwarmstrong

Thanks @bzantium !
Can you please apply this commit? 2e94b45
That way we have tests of the behavior and can ensure we do not have a regression later.

…NVIDIA-NeMo#1088