Skip to content

Fix input_file path handling when executor is "none"#1089

Merged
gwarmstrong merged 2 commits intoNVIDIA-NeMo:mainfrom
bzantium:feature/#1088
Dec 11, 2025
Merged

Fix input_file path handling when executor is "none"#1089
gwarmstrong merged 2 commits intoNVIDIA-NeMo:mainfrom
bzantium:feature/#1088

Conversation

@bzantium
Copy link
Contributor

@bzantium bzantium commented Dec 10, 2025

Summary

Fixes incorrect input_file path construction when running evaluation with executor="none" (no container execution). When executor="none", we're already in the actual working environment, so we should use the provided data_path directly without container-specific path transformations.

Problem

When executor="none", the code incorrectly constructs input_file paths by prepending /nemo_run/code/ instead of using the provided data_path directly. This causes:

  1. Incorrect path construction: The code falls into the else branch (line 98-100 in eval.py) that sets input_file = f"/nemo_run/code/{Path(data_path).name}/..."
  2. Path transformation issue: In exp.py line 673, /nemo_run/code gets replaced with ./ for non-container execution
  3. Malformed paths: This creates malformed paths like .//{Path(data_path).name}/... that cause file not found errors during evaluation

Solution

Ensure that when executor="none", the code uses data_path directly without the /nemo_run/code/ prefix. The input_file should be constructed as:

input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"

This avoids container-specific path transformations that are not applicable when running without containers.

Changes

  • Modified get_benchmark_args_from_module() in nemo_skills/pipeline/utils/eval.py to properly handle executor="none" case
  • Ensures input_file uses data_path directly when executor="none" instead of constructing container-specific paths

Testing

  • Tested evaluation with executor="none" and custom data_path
  • Verified that input_file paths are correctly constructed without /nemo_run/code/ prefix
  • Confirmed evaluation runs successfully without file not found errors

Related Issues

Fixes #1088

Summary by CodeRabbit

  • Bug Fixes
    • Improved data path resolution for non-cluster pipeline execution to correctly handle mounted paths when using specific executor configurations, ensuring consistent file access across execution environments.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

📝 Walkthrough

Walkthrough

The change updates the condition in get_benchmark_args_from_module to also consider cluster_config["executor"] == "none", ensuring that non-container executions properly treat data paths as mounted paths instead of prepending the container-specific /nemo_run/code/ prefix.

Changes

Cohort / File(s) Summary
Non-cluster executor condition fix
nemo_skills/pipeline/utils/eval.py
Updated condition in get_benchmark_args_from_module to include cluster_config["executor"] == "none" check, ensuring input files use mounted path format for non-container executions instead of container-specific paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

  • Single-file change with a straightforward conditional logic update
  • Addresses a specific bug fix without introducing new functionality or complex interactions
  • Change is minimal and self-contained to the affected condition

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix input_file path handling when executor is "none"' directly matches the main change: handling paths when executor is "none", which is the core focus of the changeset.
Linked Issues check ✅ Passed The code change modifies the condition in get_benchmark_args_from_module() to treat executor="none" as a non-cluster path case [#1088], ensuring input_file uses mounted path format instead of the /nemo_run/code prefix, which directly addresses the bug objectives.
Out of Scope Changes check ✅ Passed The change is narrowly scoped to the eval.py file, modifying only the path handling condition for executor="none" to fix the path construction issue described in #1088, with no extraneous modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
nemo_skills/pipeline/utils/eval.py (1)

93-101: Executor "none" behavior matches intent; consider separating it from the mounted‑path branch

The added or cluster_config["executor"] == "none" ensures input_file is built directly from data_path when there is no container, so you avoid the /nemo_run/code prefix and the later "/nemo_run/code" -> "./" replacement. That directly addresses the bug described in the PR.

One subtle point: in this branch you still run the same “mounted path” logic:

  • input_file is f"{data_path}/.../{split}.jsonl".
  • unmounted_path is derived via get_unmounted_path(...) and Path(__file__).parents[3] / ....

For absolute data_path this is effectively a no-op and behaves as desired. For relative data_path, though, input_file will be resolved relative to the current working directory, while the existence check will use unmounted_path resolved relative to Path(__file__).parents[3] (likely the repo root). If users ever pass a relative data_path under executor="none", this could be surprising.

You might want to make the "none" case explicit and treat it as purely local (no mount transforms), which also makes the intent clearer:

-    if not is_on_cluster:
-        if pipeline_utils.is_mounted_filepath(cluster_config, data_path) or cluster_config["executor"] == "none":
-            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
-            unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file)
-            unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", ""))
-        else:
-            # will be copied over in this case as it must come from extra datasets
-            input_file = f"/nemo_run/code/{Path(data_path).name}/{benchmark.replace('.', '/')}/{split}.jsonl"
-            unmounted_path = Path(data_path) / benchmark.replace(".", "/") / f"{split}.jsonl"
+    if not is_on_cluster:
+        if cluster_config["executor"] == "none":
+            # Local, no container: use the provided data_path directly
+            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            unmounted_path = input_file
+        elif pipeline_utils.is_mounted_filepath(cluster_config, data_path):
+            input_file = f"{data_path}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            unmounted_input_file = pipeline_utils.get_unmounted_path(cluster_config, input_file)
+            unmounted_path = str(Path(__file__).parents[3] / unmounted_input_file.replace("/nemo_run/code/", ""))
+        else:
+            # will be copied over in this case as it must come from extra datasets
+            input_file = f"/nemo_run/code/{Path(data_path).name}/{benchmark.replace('.', '/')}/{split}.jsonl"
+            unmounted_path = Path(data_path) / benchmark.replace(".", "/") / f"{split}.jsonl"

This keeps the new behavior while making the "none" semantics unambiguous and avoiding any dependency on mount-related helpers in the non-container case.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 774cef6 and dbf8c31.

📒 Files selected for processing (1)
  • nemo_skills/pipeline/utils/eval.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/pipeline/utils/eval.py (1)
nemo_skills/pipeline/utils/mounts.py (1)
  • is_mounted_filepath (27-46)

@bzantium
Copy link
Contributor Author

@coderabbitai generate docstrings

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #1090

coderabbitai bot added a commit that referenced this pull request Dec 10, 2025
Docstrings generation was requested by @bzantium.

* #1089 (comment)

The following files were modified:

* `nemo_skills/dataset/utils.py`
* `nemo_skills/pipeline/utils/eval.py`
@bzantium bzantium force-pushed the feature/#1088 branch 4 times, most recently from b725bcb to 7bc29e7 Compare December 10, 2025 13:49
…th directly instead of prepending /nemo_run/code/

Signed-off-by: bzantium <ryumin93@gmail.com>
Copy link
Collaborator

@gwarmstrong gwarmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bzantium !
Can you please apply this commit? 2e94b45
That way we have tests of the behavior and can ensure we do not have a regression later.

@bzantium bzantium requested a review from gwarmstrong December 11, 2025 01:01
@gwarmstrong gwarmstrong merged commit 42eded0 into NVIDIA-NeMo:main Dec 11, 2025
4 of 5 checks passed
@gwarmstrong gwarmstrong mentioned this pull request Dec 11, 2025
@bzantium bzantium deleted the feature/#1088 branch December 11, 2025 01:29
gwarmstrong added a commit that referenced this pull request Dec 11, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
gwarmstrong pushed a commit that referenced this pull request Dec 11, 2025
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
gwarmstrong added a commit that referenced this pull request Dec 11, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 12, 2025
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad pushed a commit that referenced this pull request Dec 12, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: bzantium <ryumin93@gmail.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: bzantium <ryumin93@gmail.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: bzantium <ryumin93@gmail.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: bzantium <ryumin93@gmail.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: bzantium <ryumin93@gmail.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix input_file path handling when executor is "none"

2 participants