Option to pass a template to format input by wasiahmad · Pull Request #883 · NVIDIA-NeMo/Skills

wasiahmad · 2025-10-03T04:25:04Z

Summary by CodeRabbit

New Features
- Optional input templating for SFT datasets: supply a YAML template to produce a consolidated formatted_input used during preprocessing and message construction for train/validation.
Bug Fixes
- ASR evaluation now uses unified Whisper-style text normalization for more consistent WER results; ensure samples are marked as ASR where applicable.
Chores
- Simplified default ASR evaluation/generation flags for clearer behavior.

Note

Adds optional YAML-driven input templating to combine multiple fields into a formatted_input used for message construction, configurable via data.input_template_path.

Data preprocessing (PromptResponseDataset):
- Optional templating: load YAML via load_prompt_config and render formatted_input from multiple fields (semicolon-separated input_key). Requires user key in template.
- Message construction now uses templated formatted_input when provided; add_messages_key updated to accept input_key parameter.
- New apply_input_template map step applied before message creation; asserts no pre-existing messages.
Configuration wiring:
- data.input_template_path accepted and passed into dataset setup.

^{Written by Cursor Bugbot for commit 2353efe. This will update automatically on new commits. Configure here.}

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai · 2025-10-03T04:25:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds optional input templating to SFT dataset processing: PromptResponseDataset accepts an input_template_path, loads a YAML template when provided, generates a formatted_input via apply_input_template, switches the effective input_key to "formatted_input" for downstream processing, and setup_data forwards the template path. (48 words)

Changes

Cohort / File(s)	Summary
SFT input templating support `nemo_skills/training/nemo_rl/start_sft.py`	Added `input_template_path` to `PromptResponseDataset.__init__`; loads YAML prompt template and validates it. New public `apply_input_template` renders `formatted_input` from one or more input keys (semicolon-separated). `load_or_process_split` now applies the template when present and sets `current_input_key="formatted_input"`. `add_messages_key` signature changed to accept `input_key` and uses it for message construction. `setup_data` forwards `input_template_path` when creating PromptResponseDataset.
ASR leaderboard defaults `nemo_skills/dataset/asr-leaderboard/__init__.py`	Updated evaluator description to Whisper-style normalization and clarified requirement for `task_type="ASR"`. Reduced default args: simplified `EVAL_ARGS` to `eval_type=audio`; removed normalization/audio flags and set `GENERATION_ARGS` `prompt_format=openai`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Trainer as setup_data
  participant PRD as PromptResponseDataset
  participant DS as Dataset.map

  User->>Trainer: call setup_data(..., input_template_path)
  Trainer->>PRD: PromptResponseDataset(..., input_template_path)
  alt input_template provided
    PRD->>PRD: load_prompt_config (YAML) -> input_template
    PRD->>DS: map(apply_input_template)
    DS-->>PRD: examples with `formatted_input`
    PRD->>PRD: set current_input_key = "formatted_input"
  else no template
    PRD->>PRD: keep configured input_key
  end
  PRD->>DS: map(add_messages_key, fn_kwargs={"input_key": current_input_key})
  DS-->>PRD: dataset with messages
  PRD-->>Trainer: return prepared dataset

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

HF ASR Leaderboard Evaluation #1104 — Modifies nemo_skills/dataset/asr-leaderboard/__init__.py; likely related to the ASR eval/generation defaults and semantics changes.

Suggested reviewers

gwarmstrong
Jorjeous

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Option to pass a template to format input' accurately summarizes the main change: adding an input_template_path parameter to enable optional YAML-driven input templating.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch apply_input_template

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6edcf2 and ec7a4f6.

📒 Files selected for processing (1)

nemo_skills/training/nemo_rl/start_sft.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (2)

nemo_skills/training/nemo_rl/start_sft.py (2)

89-89: LGTM!

The parameter addition maintains backward compatibility with an appropriate default value.

259-259: LGTM!

Safe retrieval of optional configuration parameter with appropriate default.

nemo_skills/training/nemo_rl/start_sft.py

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

nemo_skills/training/nemo_rl/start_sft.py (2)

143-143: Add descriptive assertion message.

The assertion lacks an explanatory message. When it fails, users won't understand why template application is incompatible with message-formatted data.

Apply this diff:

-        assert "messages" not in dataset.column_names
+        assert "messages" not in dataset.column_names, (
+            "Cannot apply input_template to datasets that already have 'messages' format. "
+            "Input templates are only supported for input/output format datasets."
+        )

181-186: Add error handling, validation, and documentation.

The method has several issues identified in past reviews that remain unaddressed:

Undocumented multi-key feature: Line 182 splits input_key by comma but this isn't documented.
Missing validation: No check that keys exist in examples or that batch lengths are consistent.
Missing error handling: Lines 183-185 can raise KeyError, IndexError, or ValueError without context.

Apply this diff to add comprehensive error handling:

 def apply_input_template(self, examples: dict[str, list[Any]]) -> dict[str, list[str]]:
+    """Apply input template to examples, supporting comma-separated input keys.
+    
+    Args:
+        examples: Batched examples dict with keys specified in self.input_key
+        
+    Returns:
+        Dict with "formatted_input" key containing formatted strings
+        
+    Raises:
+        KeyError: If template references keys not in examples
+        ValueError: If examples are empty or template format is invalid
+    """
     keys = [k.strip() for k in self.input_key.split(",")]
+    
+    # Validate all keys exist
+    missing_keys = [k for k in keys if k not in examples]
+    if missing_keys:
+        raise KeyError(
+            f"Template references missing keys: {missing_keys}. "
+            f"Available keys: {list(examples.keys())}"
+        )
+    
+    # Validate non-empty
+    if not examples[keys[0]]:
+        return {"formatted_input": []}
+    
+    try:
-        examples["formatted_input"] = [
-            self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]]))
-        ]
+            examples["formatted_input"] = [
+                self.input_template.format(**{k: examples[k][i] for k in keys}) 
+                for i in range(len(examples[keys[0]]))
+            ]
+    except (KeyError, ValueError, IndexError) as e:
+        raise ValueError(
+            f"Failed to apply template: {e}. "
+            f"Template: {self.input_template[:100]}..."
+        ) from e
 
     return examples

🧹 Nitpick comments (2)

nemo_skills/training/nemo_rl/start_sft.py (2)
103-109: LGTM! Template loading is well-implemented.

The error handling for template file operations is comprehensive, including proper encoding specification and validation of the required "user" key in the YAML structure.

Optional: Consider adding a check for empty template values to catch configuration errors earlier:
                 if "user" not in data:
                     raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}")
                 self.input_template = data["user"]
+                if not self.input_template or not self.input_template.strip():
+                    raise ValueError(f"Template 'user' field is empty in: {input_template_path}")
168-179: LGTM! Critical bug fixed.

The method signature change correctly addresses the critical self.input_key mutation bug identified in past reviews by accepting input_key as a parameter.

Consider adding strict=True to zip() for data integrity.

The zip() call on line 177 should use strict=True (Python 3.10+) to catch mismatched input/output list lengths, which could indicate data quality issues.

Apply this diff:
             "messages": [
                 [
                     {"role": "user", "content": input_},
                     {"role": "assistant", "content": output},
                 ]
-                for input_, output in zip(examples[input_key], examples[self.output_key])
+                for input_, output in zip(examples[input_key], examples[self.output_key], strict=True)
             ]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec7a4f6 and a08fcdd.

📒 Files selected for processing (1)

nemo_skills/training/nemo_rl/start_sft.py (6 hunks)

🧰 Additional context used

🪛 Ruff (0.13.2)

nemo_skills/training/nemo_rl/start_sft.py

108-108: Avoid specifying long messages outside the exception class

(TRY003)

177-177: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (1)

nemo_skills/training/nemo_rl/start_sft.py (1)

266-266: LGTM!

The propagation of input_template_path from configuration to PromptResponseDataset is implemented correctly with appropriate default handling.

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

nemo_skills/training/nemo_rl/start_sft.py (1)

181-186: Previous review comment still applies.

The concerns raised in the previous review regarding missing error handling, undocumented multi-key support (line 182 now uses semicolon separator), and lack of validation remain valid.

🧹 Nitpick comments (2)

nemo_skills/training/nemo_rl/start_sft.py (2)

103-109: Add error handling for YAML parsing failures.

While the code checks for the 'user' key, it doesn't handle potential YAML parsing errors or empty template content. If the YAML file is malformed or the template is empty, users will encounter unclear errors during dataset processing.

Apply this diff to add validation:

     self.input_template = None
     if input_template_path:
-        with open(input_template_path, "rt", encoding="utf-8") as fin:
-            data = yaml.safe_load(fin)
-            if "user" not in data:
-                raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}")
-            self.input_template = data["user"]
+        try:
+            with open(input_template_path, "rt", encoding="utf-8") as fin:
+                data = yaml.safe_load(fin)
+                if not data or not isinstance(data, dict):
+                    raise ValueError(f"Template file must contain a YAML dictionary: {input_template_path}")
+                if "user" not in data:
+                    raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}")
+                self.input_template = data["user"]
+                if not self.input_template or not self.input_template.strip():
+                    raise ValueError(f"Template 'user' field is empty: {input_template_path}")
+        except (FileNotFoundError, yaml.YAMLError) as e:
+            raise ValueError(f"Failed to load template file {input_template_path}: {e}")

168-170: LGTM: Signature change is correct.

The addition of the input_key parameter is necessary and correctly enables dynamic input key selection when templates are used.

Optional: Consider adding strict=True to the zip() call at line 177 to ensure batch consistency:

-                for input_, output in zip(examples[input_key], examples[self.output_key])
+                for input_, output in zip(examples[input_key], examples[self.output_key], strict=True)

This provides an extra safeguard, though HuggingFace datasets batching already ensures equal-length lists.

Also applies to: 177-177

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a08fcdd and 545f09e.

📒 Files selected for processing (1)

nemo_skills/training/nemo_rl/start_sft.py (6 hunks)

🧰 Additional context used

🪛 Ruff (0.13.2)

nemo_skills/training/nemo_rl/start_sft.py

108-108: Avoid specifying long messages outside the exception class

(TRY003)

177-177: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (3)

nemo_skills/training/nemo_rl/start_sft.py (3)

25-25: LGTM!

The yaml import is necessary for loading the template file and is used correctly at line 106.

141-149: LGTM: Critical mutation issue resolved.

The introduction of the current_input_key local variable correctly addresses the previous critical issue where self.input_key was mutated during processing, which would have caused incorrect behavior with validation datasets.

Note: The assertion at line 143 still lacks a descriptive message, as flagged in a previous review comment.

266-266: LGTM!

The parameter forwarding correctly passes the input_template_path from the data configuration to the PromptResponseDataset constructor with a safe default.

nemo_skills/training/nemo_rl/start_sft.py

Kipok · 2025-10-03T22:24:20Z

nemo_skills/training/nemo_rl/start_sft.py

        output_key: str = "output",
        num_proc: int | None = None,
        force_reprocess: bool = False,
+        input_template_path: str | None = None,


also wouldn't this fail for typical cases when e.g. you have keys like problem and generation in your file, but the logic for setting defaults checks for input / output or messages? So that part should also be updated

@Kipok I didn't understand this. When a user is passing input_template_path, their provided input_key field must match with keys present in the input_template. Why having keys like problem and generation would cause failure?

I mean the logic in detect_data_format function

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

nemo_skills/training/nemo_rl/start_sft.py (1)

181-186: Add error handling and document multi-key support.

The method can fail with cryptic errors if the template references missing keys, examples are empty, or the template syntax is invalid. Additionally, the semicolon-separated multi-key support is undocumented.

Apply this diff to add error handling and documentation:

 def apply_input_template(self, examples: dict[str, list[Any]]) -> dict[str, list[str]]:
+    """Apply input template to examples, supporting semicolon-separated input keys.
+    
+    Args:
+        examples: Batched examples dict with input keys
+        
+    Returns:
+        Dict with "formatted_input" key containing formatted strings
+        
+    Raises:
+        KeyError: If template references keys not in examples
+        ValueError: If examples are empty or template syntax is invalid
+    """
     keys = [k.strip() for k in self.input_key.split(";")]
+    
+    # Validate all keys exist
+    missing_keys = [k for k in keys if k not in examples]
+    if missing_keys:
+        raise KeyError(
+            f"Template references missing keys: {missing_keys}. "
+            f"Available keys: {list(examples.keys())}"
+        )
+    
+    # Validate non-empty
+    if not examples or not examples[keys[0]]:
+        return {"formatted_input": []}
+    
+    try:
-        examples["formatted_input"] = [
-            self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]]))
-        ]
+            examples["formatted_input"] = [
+                self.input_template.format(**{k: examples[k][i] for k in keys}) 
+                for i in range(len(examples[keys[0]]))
+            ]
+    except (KeyError, ValueError) as e:
+        raise ValueError(
+            f"Failed to apply template: {e}. Template: {self.input_template[:100]}..."
+        ) from e
 
     return examples

🧹 Nitpick comments (2)

nemo_skills/training/nemo_rl/start_sft.py (2)
141-149: Critical bug fixed! Consider adding assertion message for better debugging.

Excellent fix! The code now correctly uses a local variable current_input_key instead of mutating self.input_key, which resolves the critical bug from previous reviews where validation datasets would fail.

Optional improvement: Add a descriptive message to the assertion at line 143 for better debugging:
-            assert "messages" not in dataset.column_names
+            assert "messages" not in dataset.column_names, (
+                "Cannot apply input_template to datasets that already have 'messages' format. "
+                "Input templates are only supported for input/output format datasets."
+            )
177-177: LGTM! Consider adding strict=True to zip() for safety.

The code correctly uses the passed input_key parameter.

Optional improvement: Add strict=True to catch length mismatches between input and output data:
-                for input_, output in zip(examples[input_key], examples[self.output_key])
+                for input_, output in zip(examples[input_key], examples[self.output_key], strict=True)
Note: This requires Python 3.10+. If targeting earlier versions, this change should be skipped.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 545f09e and 62164ed.

📒 Files selected for processing (1)

nemo_skills/training/nemo_rl/start_sft.py (6 hunks)

🧰 Additional context used

🪛 Ruff (0.13.3)

nemo_skills/training/nemo_rl/start_sft.py

108-108: Avoid specifying long messages outside the exception class

(TRY003)

177-177: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (5)

nemo_skills/training/nemo_rl/start_sft.py (5)

39-40: LGTM!

The import correctly uses nemo-skills utilities as requested, and the alias avoids naming conflicts with the existing load_config import.

91-91: LGTM!

The parameter is well-named and appropriately optional.

156-156: LGTM!

Correctly passes the input key to add_messages_key via fn_kwargs, completing the fix for the validation dataset issue.

168-170: LGTM!

The signature correctly accepts input_key as a parameter, making the method more flexible and fixing the validation dataset bug.

266-266: LGTM!

The template path is correctly propagated from the data config to the dataset constructor.

coderabbitai · 2025-10-04T17:41:45Z

nemo_skills/training/nemo_rl/start_sft.py

+        self.input_template = None
+        if input_template_path:
+            input_template_config = load_prompt_config(input_template_path)
+            if "user" not in input_template_config:
+                raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}")
+            self.input_template = input_template_config["user"]


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify error handling in load_prompt_config.

The template loading logic is sound, but ensure that load_prompt_config properly handles file errors (missing files, permission issues, invalid YAML) to provide clear error messages to users.

Run the following script to examine the error handling in load_prompt_config:

Note: Line 108 has a long error message in the exception. Consider extracting it to a variable for better style compliance.

- if "user" not in input_template_config: - raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}") + if "user" not in input_template_config: + msg = f"'user' key is missing in the YAML file: {input_template_path}" + raise KeyError(msg)

🏁 Script executed:

#!/bin/bash # Description: Check error handling in load_prompt_config function # Find and display the load_config function implementation ast-grep --pattern $'def load_config($$$) { $$$ }'

Length of output: 0

🏁 Script executed:

#!/bin/bash # Locate the implementation of load_prompt_config in the Python codebase rg -nP --type=py "def load_prompt_config" -C3

Length of output: 0

🏁 Script executed:

#!/bin/bash # Search for load_prompt_config references across the repository rg -n "load_prompt_config"

Length of output: 237

🏁 Script executed:

#!/bin/bash # Display the implementation of load_config in the prompt utils module rg -n "def load_config" -C10 nemo_skills/prompt

Length of output: 1538

🏁 Script executed:

#!/bin/bash # Show full implementation of load_config to inspect error handling sed -n '350,450p' nemo_skills/prompt/utils.py

Length of output: 1814

Add error handling in load_config and extract literal messages to variables

In nemo_skills/prompt/utils.py (load_config), wrap the open(config_path) and yaml.safe_load calls in a try/except that catches FileNotFoundError, PermissionError, and yaml.YAMLError, then rethrows with a clear message including config_path.

In nemo_skills/training/nemo_rl/start_sft.py (lines 108–109), extract the KeyError message into a msg variable before raising.

🧰 Tools

🪛 Ruff (0.13.3)

108-108: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In nemo_skills/prompt/utils.py (around the load_config function) wrap the file open and yaml.safe_load calls in a try/except that catches FileNotFoundError, PermissionError, and yaml.YAMLError and rethrows a new exception (or raise) with a clear message that includes the config_path; ensure the original exception is chained. In nemo_skills/training/nemo_rl/start_sft.py around lines 104–109, extract the KeyError message into a local variable msg (e.g. msg = f"...") and raise KeyError(msg) instead of inlining the formatted string in the raise statement.

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Kipok

added a few more comments. If you want to merge this please also add a new gpu test in https://github.com/NVIDIA-NeMo/Skills/blob/main/tests/gpu-tests/test_train.py to ensure this is being tested in the ci

Kipok · 2025-10-27T20:11:00Z

nemo_skills/training/nemo_rl/start_sft.py

        output_key: str = "output",
        num_proc: int | None = None,
        force_reprocess: bool = False,
+        input_template_path: str | None = None,


I mean the logic in detect_data_format function

Kipok · 2025-10-27T20:11:23Z

nemo_skills/training/nemo_rl/start_sft.py

        print(f"[Map] Processing {split_name} dataset from: {path}")
        dataset = load_dataset("json", data_files=str(path))["train"]

+        current_input_key = self.input_key


@wasiahmad this is an important thing to fix if you want to merge this

Kipok · 2026-01-08T21:11:09Z

@wasiahmad do you still need this feature? If not, maybe we close this without merging for now as some comments are still unaddressed here

greptile-apps

_{1 file reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-30T05:22:29Z

nemo_skills/training/nemo_rl/start_sft.py

+        examples["formatted_input"] = [
+            self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]]))


accessing examples[k][i] will fail if any key k doesn't exist in the dataset

Suggested change

examples["formatted_input"] = [

self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]]))

formatted_inputs = []

for i in range(len(examples[keys[0]])):

format_dict = {k: examples[k][i] for k in keys}

formatted_inputs.append(self.input_template.format(**format_dict))

examples["formatted_input"] = formatted_inputs

greptile-apps · 2026-01-30T05:22:38Z

Additional Comments (2)

nemo_skills/training/nemo_rl/start_sft.py
cache doesn't account for changes in input_template_path - if the template file changes or the path changes, stale cached data will be used

    def load_or_process_split(self, path: str, split_name: str) -> Dataset:
        data_path = Path(path)
        cache_dir = data_path.parent / ".cache" / f"{split_name}_{data_path.stem}"
        sig_file = cache_dir / "signature.json"
        file_size = str(data_path.stat().st_size)
        template_sig = str(Path(self.input_template_path).stat().st_size) if hasattr(self, 'input_template_path') and self.input_template_path else "none"
        current_sig = {"size": file_size, "template": template_sig}
        if cache_dir.exists() and sig_file.exists() and not self.force_reprocess:
            with open(sig_file) as f:
                old_sig = json.load(f)
            if old_sig == current_sig:
                print(f"[Cache] Loading {split_name} dataset from: {cache_dir}")
                return load_from_disk(str(cache_dir))
            else:
                print(f"[Cache] Invalidated (signature changed): {path}")

nemo_skills/training/nemo_rl/start_sft.py
cache signature should include template information - currently if template changes or is added/removed, stale cache will be used

Store input_template_path in __init__ and include it in signature

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

coderabbitai · 2026-03-03T18:59:11Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{"name":"HttpError","status":500,"request":{"method":"PATCH","url":"https://api.github.com/repos/NVIDIA-NeMo/Skills/issues/comments/3364204828","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.6 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- This is an auto-generated comment: review paused by coderabbit.ai -->\n\n> [!NOTE]\n> ## Reviews paused\n> \n> It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the `reviews.auto_review.auto_pause_after_reviewed_commits` setting.\n> \n> Use the following commands to manage reviews:\n> - `@coderabbitai resume` to resume automatic reviews.\n> - `@coderabbitai review` to trigger a single review.\n> \n> Use the checkboxes below for quick actions:\n> - [ ] <!-- {\"checkboxId\": \"7f6cc2e2-2e4e-497a-8c31-c9e4573e93d1\"} --> ▶️ Resume reviews\n> - [ ] <!-- {\"checkboxId\": \"e9bb8d72-00e8-4f67-9cb2-caf3b22574fe\"} --> 🔍 Trigger review\n\n<!-- end of auto-generated comment: review paused by coderabbit.ai -->\n\n<!-- walkthrough_start -->\n\n<details>\n<summary>📝 Walkthrough</summary>\n\n## Walkthrough\nAdds optional input templating to SFT dataset processing and forwards the template path through setup_data; changes ASR leaderboard default eval/generation argument strings; refactors ASR text normalization to a cached English normalizer and simplifies evaluate_asr/evaluate_sample flows.\n\n## Changes\n|Cohort / File(s)|Summary|\n|---|---|\n|**SFT input templating support** <br> `nemo_skills/training/nemo_rl/start_sft.py`|Added `input_template_path` to `PromptResponseDataset.__init__`, `apply_input_template` to render `formatted_input` (supports multiple input keys separated by `;`), updated `load_or_process_split` to call templating and switch effective input_key to `formatted_input`, changed `add_messages_key` signature to accept an `input_key`, and propagated `input_template_path` via `setup_data`.|\n|**ASR leaderboard defaults** <br> `nemo_skills/dataset/asr-leaderboard/__init__.py`|Changed constant defaults: `EVAL_ARGS` from `\"++eval_type=audio ++eval_config.normalization_mode=hf_leaderboard\"` to `\"++eval_type=audio\"` and `GENERATION_ARGS` from `\"++prompt_format=openai ++enable_audio=true\"` to `\"++prompt_format=openai\"`. Updated related comments.|\n|**ASR normalization & evaluator refactor** <br> `nemo_skills/evaluation/evaluator/audio.py`|Introduced cached `_get_english_normalizer()` and new `preprocess_asr_text(text)`, removed legacy normalization helpers and mode branching, changed `evaluate_asr` to accept (reference, hypothesis) and rely on `preprocess_asr_text`, simplified `evaluate_asr_pc` and `evaluate_sample`, and preserved normalized text snapshots in results. Added `functools.lru_cache` import.|\n\n## Sequence Diagram(s)\n```mermaid\nsequenceDiagram\n  autonumber\n  actor User\n  participant Trainer as setup_data\n  participant PRD as PromptResponseDataset\n  participant DS as Dataset\n\n  User->>Trainer: setup_data(..., input_template_path)\n  Trainer->>PRD: __init__(..., input_template_path)\n  alt input_template provided\n    PRD->>PRD: load_prompt_config (YAML) -> input_template\n    PRD->>DS: map(apply_input_template)\n    note right of DS: produce `formatted_input` from configured input_key(s)\n    DS-->>PRD: examples with `formatted_input`\n    PRD->>PRD: set current_input_key = \"formatted_input\"\n  else no template\n    PRD->>PRD: keep configured input_key\n  end\n  PRD->>DS: map(add_messages_key, fn_kwargs={\"input_key\": current_input_key})\n  DS-->>PRD: processed dataset (messages added)\n  PRD-->>Trainer: return prepared dataset\n```\n\n## Estimated code review effort\n🎯 3 (Moderate) | ⏱️ ~25 minutes\n\n## Possibly related PRs\n- NVIDIA-NeMo/Skills#1140 — Overlapping changes to ASR leaderboard defaults and audio evaluation normalization refactor.\n- NVIDIA-NeMo/Skills#1104 — Related edits to `EVAL_ARGS`/`GENERATION_ARGS` for ASR leaderboard defaults.\n\n## Suggested reviewers\n- gwarmstrong\n- Jorjeous\n\n</details>\n\n<!-- walkthrough_end -->\n\n\n<!-- pre_merge_checks_walkthrough_start -->\n\n<details>\n<summary>🚥 Pre-merge checks | ✅ 3</summary>\n\n<details>\n<summary>✅ Passed checks (3 passed)</summary>\n\n|     Check name     | Status   | Explanation                                                                                                                                   |\n| :----------------: | :------- | :-------------------------------------------------------------------------------------------------------------------------------------------- |\n|  Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled.                                                                                   |\n|     Title check    | ✅ Passed | The title clearly and concisely summarizes the main feature introduced in the PR: adding an optional template parameter to format input data. |\n| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.                                                          |\n\n</details>\n\n<sub>✏️ Tip: You can configure your own custom pre-merge checks in the settings.</sub>\n\n</details>\n\n<!-- pre_merge_checks_walkthrough_end -->\n\n<!-- finishing_touch_checkbox_start -->\n\n<details>\n<summary>✨ Finishing Touches</summary>\n\n- [ ] <!-- {\"checkboxId\": \"7962f53c-55bc-4827-bfbf-6a18da830691\"} --> 📝 Generate docstrings (stacked PR)\n- [ ] <!-- {\"checkboxId\": \"3e1879ae-f29b-4d0d-8e06-d12b7ba33d98\"} --> 📝 Generate docstrings (commit on current branch)\n<details>\n<summary>🧪 Generate unit tests (beta)</summary>\n\n- [ ] <!-- {\"checkboxId\": \"f47ac10b-58cc-4372-a567-0e02b2c3d479\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Create PR with unit tests\n- [ ] <!-- {\"checkboxId\": \"07f1e7d6-8a8e-4e23-9900-8731c2c87f58\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Post copyable unit tests in a comment\n- [ ] <!-- {\"checkboxId\": \"6ba7b810-9dad-11d1-80b4-00c04fd430c8\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Commit unit tests in branch `apply_input_template`\n\n</details>\n\n</details>\n\n<!-- finishing_touch_checkbox_end -->\n\n<!-- announcements_start -->\n\n> [!TIP]\n> Try [Coding Plans](https://www.coderabbit.ai/issue-planner). Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).\n> Share your feedback on [Discord](https://discord.com/invite/coderabbit).\n\n<!-- announcements_end -->\n\n<!-- tips_start -->\n\n---\n\nThanks for using [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=NVIDIA-NeMo/Skills&utm_content=883)! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<sub>Comment `@coderabbitai help` to get the list of available commands and usage tips.</sub>\n\n<!-- tips_end -->\n\n<!-- internal state start -->\n\n\n<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrIPR1AGxJcA8t3H4WAQ8aIjIaJA0zNwe1CSR+JAAZvgUzNQoGNx4BgByjgKUXAAcxQDMBgCqNgAyXLC4uNyIHAD0rUTqsNgCGkzMrbkAagCSACIjAIJguSQAsvitAMoA1vAeHoit2RutpRWViEWQAO6h8Giw6bQGS/jYFAzxAlQYDLBcaNwxsgD68Fk8L8ojE4m5nKRcJAXph3lx0gDbrhqNgWvxuGQDABhCgkOL0ahcABMAAYiQBWMAARhJYBJZWgJIALBwKRxmQAtIxjaQMCjwfzwQIcAxQSa0WjIfCCwJoDyQACakzmNTAtH5UiwAOyUJBsXEGCICQUzDU5EgzGwHnEMXiSXgJA8ksywQiKTS1BotH+gKhqLoyVSFukiDQpAUGEQuAo2DEQowABoI/aiA8VF5IBILpBaNQ0BptUC9XFftxqLANKLIGM8zxcbx8E8wgCjQAKKwUFj+GzSbiBI415FHXAAShFkF0kD8AQwcsiJGi+pbXA8+DQ9CVKsz2dX69LneiuF+TAwKfQGHouIvlEDHsadB9OuSB4tVptGftjudraOzHgTFXDAwCOMsqC9TIdV+FYSFkEcNEgHsAEdsHgXFkH9PhoPkAF50XOJKwnKA5hDMN4hPKMYzjQJIAwfATkgf1kGLcD3XSe9vULKETlgMg63wLMlFoABudAJV+NgwlIxAoJghjuFzcDXQYJ5/AgoEsJCKg2BoCgCMnWZ6K+H5HyLBdQRoC0vnsGhuHQb4PAdehCndeIJNDcM+TxGcRNCI4KFwZBaLrEgwBIAAPJADSNNypMrKAsUCFM0xnU5UOXAxCJzPMC19YEzP1EhS3LdBlJIfwA0wegyzCAMAWCBTQhIKFh2wbhKyqeT8X4YRRHEKRkAcJwXEgVtsDeWBMFIehwpUqF60SaoagUChcTEOgxzcTxvCnGUgkSarwlw8z4mCViMk4vICmOfYqlqepGmaNoOi6Ho+hYQZRgmaZZgWZY1g2LYdg8PZyiqPyuDORALiuddbnuR5nleOE7OMzi8rwmhwQoSFoSRj5LMRJZkVwVEuGlTEcS8irOEgUkKWpWl6UZFk2U5cxLAS1h2AGxx0hcKseyzEgThvABpAV8BWdBEGg+huKani+FwHicGVoMo3WeVyDoJiEGQJIvIeeJKvsbAiFIKMAwYVcocNSIVdsVK1bwYNsZbFAkhQXAAHJAsSQDSD4bXBKTWiovsFgyJYNgMACyBcQRLAxvXdUQzoeDW2I2P4wDDCuAAAUh6HrhHdnIE5v8oRii3TaG1w4FQI4qK1KVK/AwoGDQf17fiBsBKt6P1GDSTw1QUhyH5BhWjG3ECoJC8c3waQaPwKEU5ddhoRgwJ6GV+JHaURA+QFGd2v0YxwCgMh6HwT2u4IYgyGUcD+hjmneG6kQ4366F5CYJQqCqHUFoHQ58TBQAbsgVAmBVaEHHs/AeXNY5cCoPRQafN5ByAUAAlQahNDaF0GAQwF9TAGHIMwfAvwZaay2NGbQGAWytHIZQigwMozOCPIgJImhuCyBFAAIkEQYCwkBJgjEfhPLq6DnDyFvowCahppBGCgCMWOnZaCxmXpxI6S47YOG+KkKEWYIgRHIPRNGzFCplmVppNA2kbzBA7F2XAPZEB9kjCQQcjVNC/B9OoXxSZdy0HdhETcS1LEjWMZAIJ+5nHHkSvAIgI5zz0CjKkd2Q9Qj2EdEkHKkFLEiQkHKeACll68GkGQJ4/A76QH4RhfhkAsJxUgAAdR4lgGBFj8pxBQMgPuJS6CBLXN6VIsSmzSTcQ5KEtF6JXgAYdM67ETJQk/E6X+KMPCyBCfZP4XSMYnUSHvLKQ4mpJhNogE46h3jLyOSQJIBtv7xDRhpYI/DFlemWQ090psBBHGQpvBs4yWzNMqJ1GgyAGrDj4kCu2SRVz0VdDs28bFw7OVSPEBRtAHJ22rlompETUDlKOLHM5C8Do3JVkZTsvALgWWebJV0YlcXSQ0lEpIGAoJnGxogZp4pnSmOFjwHoDkGDBjVgSHZyz0bHWNHMm87yHzaKwUZTZ7sjlBIDBE4IRy3GiHgJ+eg9LZA/mST+GIQ8sG/n/PgQCiA4JVgWMEz84QmUkQtjJeQUMiCzhJriY0aBSqqSNSkhiRxdYZBeYkcKdCxBqTjt83FEYKKxhSgCS265qnx0dFsu21EjgeFyUa5pTiyxEDiFA3Kliio2OVp2M2sBskk24L8BqxpyLRhTdYA83Zez9k8XmKFlya0qzMbY+xFAkxkHTGq7p4ccJ0IBO7VtgKQzArLpMa0z94xMUOSrJQ1tnDUG3Zm8KfZ/IBiDNkAQIrIDsHUA6RAyjIDEXFfYRJPqjZyVKbQcck4AAGShPa+MXUeX4P4clJgXRyyU1b8YUSTEUjwLbpLWLg9GSAAAfSAuRAjxAALzYdw0mI1XAKKQAI/wzi/Ckz3CaOpGCpH0MUdozqajNFHBjK4HVTDhHzQEZw+QJM7oni/HrJ2cZXABD4BteRyAAAxOURxiOVtnVY8sjG+BYYE/h3jJARx/syIwWIYRIB/qcYeVx7iBwDqagZnCf7mFUP+psVoUHGGOdYa0dh/kqHcI0Lwv9DqFaKDfd6lEfrWrft/VAADdzRLemZR68DBbJ1hTsbaNEwSxAAG14PRMitlyYGBZAAF0SvKcglhDTySiE5n/LgXL0ZAkFYclGbLWWGt5aK6VsrJW7NYAPSZsz3aXG9o8V44c/XTOOeoQDVzVBF2GiYQuFhbDkQ+a4Tw2QgWoAGSFde/8YrYCKGi6ZwDGzdkqf2clpIqX0teEy/VxrE78tte62Vmreg6s5by61zr0YytTcG8gYbzjLN9om7ZwzDmVtOZofN+h7nYeee85wvzAWguvq9R+iLYK6Cndi57GJoyV1hCoea3AN2kyoY00mSZ/jZxsGq7oL7kPcAGYTvQxASYBAuzqpQWc8pUSkRXvRTuANTP5tySqy7+TVMGZNuSkH7LOUQkQHhgA3pR3KWF+FcAYA8K8R4jUAF8DPBD/SncSbrpAer/c0l9x3UnvvC/ESL+ICfnZas2hqrYCDQQYQAL2OJMPA+BoCSzIPAYPL2GrxNPIkscBn3eKX2r5e2daiANr/Xs46sG8Nx5PCmDQkJWza7l/s2DbHtP6dOO0pNHa4x21BxZsb1mTns+hzN5ztCFsMKWx5tbHDfNbcC4I/hyjSHd/h5CpqrRQgUDAF4dclApPOFoK0YDDDQP+b4QYcfwjLBiIkQg1JvMZGZveJNJRVZQXfp7rexD2Aj3URNvAsC8Z0DY0cJvQDXdrQtAZSTgACiQwkwNQvwkwNgAA4ksPItfvQEkC+PwgANQoEkCIbAiyAYh4ZdzBKJBoEYFyjx7F60QegOSB4v4coUJKB4awBJC/DL4AJr4UC0ANKvKEGYG4DYEkC4HYD4H8J6RQDQHAG5DAE2CTDQAjA+C5CQEwFwFX6KKIHIFoENiHi/CLJ4bkyzg6CEGzjXqFR4FCh4YdokDsGJCoEoFqH+AaGpBsRaEYg6GCG35470CvzcynY9ighPCXh3KUCVIHKQAAASZsnQhoCmVSTBq+a4rBwUpOtsRoQ6DabSSAGIi+UYsgGYNAYU0ydhxSlBp8QBYoEoFUK8dKF4/4R6dsysGQraoYi4y8iAx2Vo9AE0UgkQoQKwWBOB/CkwSwNgXyl61KN4LS4hjAcoBuS4gQzSCUaiMmcKdEIahQbRQoFAnhK2Ugu8Ks9BjBeIzBMR9AZB6QFBVByQsQRAxGi4OaRof+b4K85B0epxtAS8fsfoRw9sGQNBJAQhCEGxmqKsU6BhvwRhiQcKYYlxPwIS/BQooU+hXg9A7+pxjED+tx1oW8KxqQ7UT6Vgwq/4rQp6hiAY+6sQH+/Y8BShhmT+zwLsYchm2Orup2oB4BchsBmYco2AZECiU0PxIhYhEhUhMhLJcBVJ5J3J66m6pJkYxoRyxJh6M4UonsBJ56N8fAV6N6d64gN+8U/YyIscop+OpmTJEBUBsBU2MOFCcOc2s+uA8+iAi+URFALBG+W+/ivwu+BmSBLApmlhRBSG3BOBIJkAnBxBReiSGgRx+RVB4kigvBOxDpTp/C5uiQf6PpXBPBfBAhO25cOpmAUIihU0XAf6vJ4hkh0hshJpSwZp0+VpNmNpC+S+ex0R6+m+fiO+AWz4XpKZqhI2thd4DhU6uh6BcJhh0J+AJhMYZhSZ3p3ZcSmh2h2giZVYFcm8KeAYrYtEQESp4E9JvqemXAzxBub8px2RUIh8x8ZoRovpz+M4TCeRJxKUyxaAWYQYJsfRdgraCAepKcdArQNB+qjkI0QUdyDyUIeaLuu5dqZccwmA+q0gUIcm6w8QRWcosgMeRgNQAIy8+ZBpKBTIrQYAFQBgwBGsbEA8SgWaQs9EwFhiXAxEwSjg++QiooU+sOs2LmV5VB+JT+1AqQ8+o5u+AiQiIix+iJ255+w0ci2Fj6VYfKVsAan5RoPEHgaRt4kAwBhorWDaEZ95n+US+cHgMYx48lIU+6qQXUvwkIvwZARAmlvw2l0elArYySJsdU6imi9AsQge2E0Qhima6lNlSAsA0A0aOGDxMezSXhsQPhPAlAYA4uWsd5jxKUoEdiTUlASVn+SR6Ab6GlvcYmjYIYwJdpeUORfu0azONRbxFKZExl9A/lml9xxxDlFAIkCc/EAYXxYAAgjUTkeM7s3y9lBR8YzSHFNARVfAO5n62FyhXpo1hhdprYuIBsV4TwSYsA2Bq8PEUM3OjVkZM40ZSgySwQc141i1fhK1JAa1G1e821zlC8Myoa1VNEgq5S8R41JVKyQYUmNiS1/hbwxsC861fYN1SAzSRKlA7Rg1mq0a9gs4zQx2cclqBA/I1RMNiuuI3oJ5W8LkyQAIc4Dgvy4gJM8ppKvhvqQQW1TyFN8QaEdxY0ACtSJ5DSJs/C5SmN0azhV83FY1C+pYoqUMi4/5Bpg1pxD1coZwsg6EYaD+o6r14mhVvNJ5Z1y1ARd1VU+V4y71StQNm10gSAyS8sWAUNVCupuYrBvwIsfAqA0AE5IkRy5Sz5qIFoMZX+qYb8IQNiqAbVmxEVfxBI0SEI8QUKciXgZaDA8gItKUSlaR4QC87auZyArYdliVweLaiS6g0kBAFtqQkoSYVlYUZYF48eMaJNkAomfxydHKuqogsAVlXgb8O1YBNQ4wvwuQPgNgcw4BIwHIpZgpcwPgYwwBSwzlAcTsDaRyNqhqDRR5KUcictBVZOitMNlVJUs0pRXxY6aVukVYJ19Rtob6gtBqKCft1k/I3w7sci0dSQVowU9oYU0grVJ9b5vwNQwBkwg9NgAAQj4FAWMMBBiAwP+aKqhoZkkE+UGHIhEGNELfQG+R7bACJORDaiUl1HA6+f0WAFYFiGcYscELPLIkbYKidQviGqub4Q4GiaspKM0qoqIv0bjLCA2kkThLvfdpdbeg5H+D6h1TGV1T1btTpYECJA9cEqtLgJsuMRLsQwtaer1A+JgBcpQEmKJfGPaiolgG+Zg9gzCONEmNBKpNIxQHzaGiLrzmvA5N6gGFlaOpHZ/gsScA7gCEGCeNGDamAPY6JEIKiLgA3Z8LQF45bNNM4BI3+M2IaCfpKZAJitikaIij8EmpFOwBI98m+a0Jo1g2PbRi6G5YA3omVHKYjIw8vCk/0S/W/R/d/b/TQz5f5F+nEH44JB2cwMkGNGINJpsCgDU1CAZdgEZdcu1MJRKVQTuiiaICScMyegXYSSqftuqdnJqdJcUYJPuXFhZU1FZRpYFcnWFY5Z9mpZs00cFTkaFU1THiNDSkUhZNHZQAbV0JAPpYZZ3Ncq2OkGFFDMHnhlSMkjhNWexdzfGFxeyTxRQHxfgbvlWDiQdgwCs57PPZrUvaVSeczrVhRBC7idCzmHFoY8rX9U8LTlE9dVtUgMi19h1s9kmO9mi1CzC4/kCzzcVdwAwDixdfizrSDWiHlsbewheOvhbccFJjJgRjbRyaHCnVGV8RprJvwty2bWwXs2S11sViViNJNX6sBb1JINmo9bQJBs0VngIxlYEAdU8n7NEoEIHCEDVLvPtBrSGKXFAJCyKlwK2D2BQohnamXcbcEp0AFMCJQicDnYgGVTkSS6fVS06yNK6/xIpskvnYXd6C41QM3EG0i6fXs6iw6+i861G+67G97YVBueTiQDXXXQuNzMGzTBROm9GOG/+NmxsTG5AE3S3W3R3V3T3QKbIf3YPXAQRmXjK+vmxvwiCYUONEO3GY2Y6QcUOxuWYfa9YFm7Sx4NeYVHvV4K2BoJu8kg9ciWQ0u9eXY/Cm2nKPKIY5AK0PuyWLzYy0mN7e7BRAKL8FfVaPuHcvAPfVLSLibEcs/a/e/eIZUzYH/bqoA/aMA+WOhZhcgNNVwCgUSESPhVSOSBUAfsxcQuAregvJA2HhE11O4cgvHGgGguJZgn/DGYAngiAoQoYBh6/P4iUtJLiJRQ+KjtRyQlfAwAAOxoBMhJAABsnHpIeIJAVInH5IfHSQZQJA5IVIfHAAnCQCSDJ3x0yJx6p3J0SEyCSIUHJ2xxh2gCSMUEkAwCUQINp2gAp0yAZ1JwwESGZ3J0yJ3GUJx1SGgAZ85CSMWwIHp5fJAOSEyOSEkCSAp+SEZ3x3x8ULmJFyvg5zxwZ5xxJ8F3iHJySEkMUAwAFz5xAJAHx0SLJ0yHQAIJxwwMUAp0kLQJxySCSHx7QHJ3Jw8mUGgHx3ckSAwAJ/x+SAIPJ1l1fHV/B0SNxyQGZ1SLQNp8UJx2UDSLl256Vyp0yGUOSHJ2gOSCSBl7V7p+fAYLR4PMbjBkxw6CLN6NfD18FNbtjIVNcgwCsBMutlCJtxrkAZRogLYJ/auFd3QMubHFYPgIE3rskIppdY900fcE6K942CsLYH92A5sIDxOE9z4FICtCUkoBgFDwDwmI98ErQDYGNGMI2ETCjUQIgFiDxFd396YRj3D1jzjxgO4OIyQCT6ICsOTxOZT7UtT7j7yGfTOIz2T1wND0po99irLCMGEByYgAT394Imz/wsZrgLzysK4m+IgH99lkAROA9xOFr7UpdysLkKlVLzyEfNz5/gr9Rur1r9K8TKiCzyKxb3D6erED6vGFLwr/YGsN8AGNqUoDYLguoIAJgEyACAWeDZUg8o0iw0qAgJ8JGg5v2vcPXxUvXK/eRAcf8f/C6SYRcoCv+vbAUvZ5JvgQE+2vJubPGv9vOvpPevBvXA/CdPGYuvaf2vVvKIKvXAFPFf/CjvmAVBUvcAJ0W0RmeIrC8gJsJ4gD+anq4lDlusrk9CyQhsfqrlig7lhmRytgDTISWA0oM4c4ESKV46xoiycaxy+YTflviftfyfLY5/cPq0CSqYuIaPMPZflvmfeNHgOfNfjNW0xfWvpfFvTXun11658zCtfPHgwAfZ2wEoiPUiLf1qTsISYbfSIKz077d9neRfcAY2CgFGgmAsA0eMgBpAkgNAVXAAKR15/wDaRuNgHuT/gHQsceCP3yzTIRUI/xNCMdjWSoBigxAsgbH1f4J8YySfZwCn3gEZ9+QWfT/lX1AH59sB0YFsCrwt4m4gC5WIXqEFwC2Ajex8XaFLzM4LcZO5IXMESDKALdiu5IO5EyAED8cUuAnJkLQH85ydigdyZSJpwYBUg5OKnLrm5zKACAqQZQYoHZ2c58dO4qnEgJx045p9Zeag2wPXzAG1IlOAgOTmUHg5ydOOcnJDuJ1SFUgDYexXjqJypACBHOWQkrk1ySCycFOmQtAKkJIC5dOOSQSwZxwEAMA6h6XEkPkIiEHkcBMA5QKQFUQ6RBcRMOIH9yAFw9fmWwUagC1Gq8UQSu+IYZ3wIDIgPAcmVpvKT+7kh+BtSa+m8HlItIugEAnAcgPJCKCgCSgpQVt187lIzupAY8FX2kjHcwEvne+JQjLD+gTaJYVjvd0iFRgrAXcI4LQEmCjZmOtACuOoAShjRcAeuckKcP05h4ioLwxAYVDuFEIgAA -->\n\n<!-- internal state end -->"},"request":{"retryCount":3,"retries":3,"retryAfter":16}},"response":{"url":"https://api.github.com/repos/NVIDIA-NeMo/Skills/issues/comments/3364204828","status":500,"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","content-length":"0","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Tue, 03 Mar 2026 18:59:11 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"github.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","vary":"Accept-Encoding, Accept, X-Requested-With","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-media-type":"github.v3; format=json","x-github-request-id":"8C4D:3BEA78:B5890:30E65A:69A72F7E","x-ratelimit-limit":"60","x-ratelimit-remaining":"0","x-ratelimit-reset":"1772567328","x-ratelimit-resource":"core","x-ratelimit-used":"8304","x-xss-protection":"0"},"data":""}}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/evaluation/evaluator/audio.py (1)

376-386: ⚠️ Potential issue | 🟡 Minor

Return a stable ASR-PC metric schema for missing_generation.

At Line 376 through Line 386, the ASR-PC early return only includes wer, while the normal ASR-PC path returns wer, wer_c, wer_pc, and per. This can break downstream consumers expecting ASR-PC keys.

Suggested fix

-        # ASR / ASR-PC
-        return {**base, "wer": 1.0}
+        if task_type == "ASR-PC":
+            return {**base, "wer": 1.0, "wer_c": 1.0, "wer_pc": 1.0, "per": 1.0}
+        return {**base, "wer": 1.0}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/evaluation/evaluator/audio.py` around lines 376 - 386, The
early-return for missing_generation when task_type is "ASR-PC" only returns
"wer" but downstream expects the full ASR-PC metric schema; update the branch in
the block that checks task_type and generation (the if task_type in [...] and
not generation) so that when task_type == "ASR-PC" it returns the complete set
of keys used by the normal ASR-PC path (e.g., wer, wer_c, wer_pc, per) with
appropriate default values (e.g., 1.0 for error rates) by merging them into the
existing base dict instead of returning only wer.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/evaluation/evaluator/audio.py`:
- Around line 166-167: The code currently always applies Whisper normalization
by calling preprocess_asr_text(reference) and preprocess_asr_text(hypothesis)
and calls evaluate_asr(...) without honoring config.apply_whisper_normalization
or config.normalization_mode; change the flow so that before preprocessing (or
before calling evaluate_asr) you check config.apply_whisper_normalization and
config.normalization_mode: if apply_whisper_normalization is True and
normalization_mode == "whisper" call the Whisper-specific normalization routine
(or call preprocess_asr_text with a mode parameter), if
apply_whisper_normalization is False skip Whisper normalization and use the
standard text preprocessing, and if normalization_mode is set to an unsupported
value raise an explicit error; also update the evaluate_asr(...) call site to
pass or respect these config flags rather than ignoring them so user settings
are enforced (refer to preprocess_asr_text, evaluate_asr,
config.apply_whisper_normalization, and config.normalization_mode).

---

Outside diff comments:
In `@nemo_skills/evaluation/evaluator/audio.py`:
- Around line 376-386: The early-return for missing_generation when task_type is
"ASR-PC" only returns "wer" but downstream expects the full ASR-PC metric
schema; update the branch in the block that checks task_type and generation (the
if task_type in [...] and not generation) so that when task_type == "ASR-PC" it
returns the complete set of keys used by the normal ASR-PC path (e.g., wer,
wer_c, wer_pc, per) with appropriate default values (e.g., 1.0 for error rates)
by merging them into the existing base dict instead of returning only wer.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62164ed and e992227.

📒 Files selected for processing (2)

nemo_skills/dataset/asr-leaderboard/__init__.py
nemo_skills/evaluation/evaluator/audio.py

coderabbitai · 2026-03-03T18:59:17Z

nemo_skills/evaluation/evaluator/audio.py

+        ref_std = preprocess_asr_text(reference)
+        hyp_std = preprocess_asr_text(hypothesis)


⚠️ Potential issue | 🟠 Major

apply_whisper_normalization / normalization_mode are effectively ignored in the new flow.

At Line 166 and Line 167, ASR-PC standard WER always uses Whisper normalization when enabled, regardless of normalization_mode. At Line 399, ASR always calls evaluate_asr(...) without honoring config.apply_whisper_normalization. This silently ignores user-provided settings.

Suggested fix (make config effective and fail on unsupported modes)

def evaluate_asr_pc( reference: str, hypothesis: str, normalize_standard_wer: bool = True, normalization_mode: str = "standard" ) -> dict[str, Any]: @@ - if normalize_standard_wer: - ref_std = preprocess_asr_text(reference) - hyp_std = preprocess_asr_text(hypothesis) + if normalize_standard_wer: + if normalization_mode == "standard": + ref_std = preprocess_asr_text(reference) + hyp_std = preprocess_asr_text(hypothesis) + elif normalization_mode == "none": + ref_std = normalize_whitespace(re.sub(r"[^\w\s]", "", reference.lower())) + hyp_std = normalize_whitespace(re.sub(r"[^\w\s]", "", hypothesis.lower())) + else: + raise ValueError(f"Unsupported normalization_mode: {normalization_mode}") else: ref_std = normalize_whitespace(re.sub(r"[^\w\s]", "", reference.lower())) hyp_std = normalize_whitespace(re.sub(r"[^\w\s]", "", hypothesis.lower()))

-def evaluate_asr(reference: str, hypothesis: str) -> dict[str, Any]: +def evaluate_asr(reference: str, hypothesis: str, apply_whisper_normalization: bool = True) -> dict[str, Any]: @@ - ref = preprocess_asr_text(reference) - hyp = preprocess_asr_text(hypothesis) + if apply_whisper_normalization: + ref = preprocess_asr_text(reference) + hyp = preprocess_asr_text(hypothesis) + else: + ref = normalize_whitespace(re.sub(r"[^\w\s]", "", reference.lower())) + hyp = normalize_whitespace(re.sub(r"[^\w\s]", "", hypothesis.lower()))

- elif task_type == "ASR": - metrics = evaluate_asr(expected_answer, generation) + elif task_type == "ASR": + metrics = evaluate_asr( + expected_answer, + generation, + apply_whisper_normalization=config.apply_whisper_normalization, + ) updates.update(metrics)

As per coding guidelines "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing. Use dataclass or **kwargs syntax to handle this automatically".

Also applies to: 399-399

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/evaluation/evaluator/audio.py` around lines 166 - 167, The code currently always applies Whisper normalization by calling preprocess_asr_text(reference) and preprocess_asr_text(hypothesis) and calls evaluate_asr(...) without honoring config.apply_whisper_normalization or config.normalization_mode; change the flow so that before preprocessing (or before calling evaluate_asr) you check config.apply_whisper_normalization and config.normalization_mode: if apply_whisper_normalization is True and normalization_mode == "whisper" call the Whisper-specific normalization routine (or call preprocess_asr_text with a mode parameter), if apply_whisper_normalization is False skip Whisper normalization and use the standard text preprocessing, and if normalization_mode is set to an unsupported value raise an explicit error; also update the evaluate_asr(...) call site to pass or respect these config flags rather than ignoring them so user settings are enforced (refer to preprocess_asr_text, evaluate_asr, config.apply_whisper_normalization, and config.normalization_mode).

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/dataset/asr-leaderboard/__init__.py`:
- Around line 17-18: The evaluator currently uses sample.get("task_type",
"unknown") which silently skips WER computation; update evaluate_sample in
nemo_skills/evaluation/evaluator/audio.py to access sample["task_type"] directly
and add explicit validation (raise a clear exception) for allowed values (e.g.,
"ASR", "ASR-PC", "ASR_LEADERBOARD") before any metric computation so
missing/invalid task_type fails fast instead of falling through to the else
branch that omits WER.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62164ed and 8d6766d.

📒 Files selected for processing (1)

nemo_skills/dataset/asr-leaderboard/__init__.py

coderabbitai · 2026-03-03T19:55:31Z

nemo_skills/dataset/asr-leaderboard/__init__.py

+# Uses the audio evaluator which computes WER with Whisper-style text normalization
+# Data samples should have task_type="ASR" for proper WER calculation


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Locate task_type handling echo "=== Search 1: task_type handling ===" rg -n -C3 --type=py '\btask_type\b' 2>/dev/null | head -100 # 2) Locate audio evaluation / WER entry points echo "" echo "=== Search 2: Audio eval / WER entry points ===" rg -n -C3 --type=py 'WER|word.?error|whisper|eval_type\s*=\s*audio|METRICS_TYPE\s*=\s*"audio"' 2>/dev/null | head -100 # 3) Look specifically for ASR guards/assertions echo "" echo "=== Search 3: ASR guards/assertions ===" rg -n -C3 --type=py 'assert .*task_type|if .*task_type.*ASR|task_type\s*==\s*["'"'"']ASR["'"'"']' 2>/dev/null | head -100

Repository: NVIDIA-NeMo/Skills

Length of output: 18378

🏁 Script executed:

#!/bin/bash set -euo pipefail # Get the complete evaluate_sample function echo "=== Full evaluate_sample function ===" rg -A 70 'def evaluate_sample\(sample: dict' nemo_skills/evaluation/evaluator/audio.py # Check if there's any else/default handling for unmatched task_type echo "" echo "=== Check for else/default handling ===" rg -A 80 'elif task_type == "PC-Rate"' nemo_skills/evaluation/evaluator/audio.py

Repository: NVIDIA-NeMo/Skills

Length of output: 4129

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find where evaluate_sample is called echo "=== Callers of evaluate_sample ===" rg -B5 -A3 'evaluate_sample\(' nemo_skills/evaluation/evaluator/audio.py | head -60 # Check if there's any validation of task_type before calling evaluate_sample echo "" echo "=== Check for task_type validation before evaluate_sample call ===" rg -B10 'evaluate_sample' nemo_skills/evaluation/evaluator/audio.py | grep -A 10 'def '

Repository: NVIDIA-NeMo/Skills

Length of output: 1613

Enforce task_type="ASR" validation in the evaluator, not only in comments.

The evaluate_sample function at nemo_skills/evaluation/evaluator/audio.py:472 uses .get("task_type", "unknown") with a silent default. If task_type is missing or doesn't match expected values (ASR, ASR-PC, ASR_LEADERBOARD, etc.), the code silently falls through to the else clause (lines 528–531), which skips WER computation and returns minimal fields. This contradicts the documented requirement in nemo_skills/dataset/asr-leaderboard/__init__.py:17-18 that data samples should have task_type="ASR" for proper WER calculation.

Use direct access sample["task_type"] instead of .get() and add explicit validation before metric computation to fail fast when task_type is missing or invalid, preventing silent metric loss.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/dataset/asr-leaderboard/__init__.py` around lines 17 - 18, The evaluator currently uses sample.get("task_type", "unknown") which silently skips WER computation; update evaluate_sample in nemo_skills/evaluation/evaluator/audio.py to access sample["task_type"] directly and add explicit validation (raise a clear exception) for allowed values (e.g., "ASR", "ASR-PC", "ASR_LEADERBOARD") before any metric computation so missing/invalid task_type fails fast instead of falling through to the else branch that omits WER.

Kipok · 2026-03-06T18:32:44Z

probably won't merge this @wasiahmad as we are trying to upstream everything into gym / nemo-rl. If you want to keep this functionality please create a pr into nemo-rl directly (probably not using our prompt template, but you can introduce similar logic there). In the future, we will try to always use a built-in script and if we need changes, they'd need to go to nemo-rl directly to avoid divergence

adding an input template for formatting input

ec7a4f6

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad marked this pull request as draft October 3, 2025 04:25

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

nemo_skills/training/nemo_rl/start_sft.py Outdated Show resolved Hide resolved

nemo_skills/training/nemo_rl/start_sft.py Outdated Show resolved Hide resolved

nemo_skills/training/nemo_rl/start_sft.py Show resolved Hide resolved

nemo_skills/training/nemo_rl/start_sft.py Show resolved Hide resolved

wasiahmad added 2 commits October 2, 2025 21:45

adding an input template for formatting input

83dd6f2

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

minor bug fix

a08fcdd

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad marked this pull request as ready for review October 3, 2025 05:01

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

replacing comma with semicolon to avoid hydra issues

545f09e

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

activatedgeek approved these changes Oct 3, 2025

View reviewed changes

Kipok requested changes Oct 3, 2025

View reviewed changes

wasiahmad added 2 commits October 4, 2025 10:26

Merge remote-tracking branch 'origin/main' into apply_input_template

c010c57

using nemo-skills load config

62164ed

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad enabled auto-merge (squash) October 4, 2025 17:40

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

wasiahmad added 2 commits October 6, 2025 10:42

Merge branch 'main' into apply_input_template

081cd0e

Merge branch 'main' into apply_input_template

1260e81

This comment was marked as outdated.

Sign in to view

wasiahmad added 5 commits October 15, 2025 13:31

Merge remote-tracking branch 'origin/main' into apply_input_template

e4d35c5

Merge remote-tracking branch 'origin/main' into apply_input_template

ef650c6

Merge remote-tracking branch 'origin/main' into apply_input_template

2353efe

resolving conflicts

e0e9009

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Merge branch 'main' into apply_input_template

b6e3f46

Kipok requested changes Oct 27, 2025

View reviewed changes

wasiahmad added 4 commits October 28, 2025 13:20

Merge remote-tracking branch 'origin/main' into apply_input_template

0863b47

Merge remote-tracking branch 'origin/main' into apply_input_template

7b610ff

Merge remote-tracking branch 'origin/main' into apply_input_template

3c9970e

Merge branch 'main' into apply_input_template

ef68fe7

Merge remote-tracking branch 'origin/main' into apply_input_template

0d655ed

greptile-apps bot reviewed Jan 30, 2026

View reviewed changes

HF ASR Leaderboard Fix (#1140)

e992227

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Merge branch 'main' into apply_input_template

8d6766d

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Merge branch 'main' into apply_input_template

66af4d5

Kipok added the reviewed label Mar 6, 2026

		examples["formatted_input"] = [
		self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]]))

-        examples["formatted_input"] = [
-            self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]]))
+        formatted_inputs = []
+        for i in range(len(examples[keys[0]])):
+            format_dict = {k: examples[k][i] for k in keys}
+            formatted_inputs.append(self.input_template.format(**format_dict))
+        examples["formatted_input"] = formatted_inputs

		ref_std = preprocess_asr_text(reference)
		hyp_std = preprocess_asr_text(hypothesis)

		# Uses the audio evaluator which computes WER with Whisper-style text normalization
		# Data samples should have task_type="ASR" for proper WER calculation

Conversation

wasiahmad commented Oct 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kipok Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

wasiahmad Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok commented Jan 8, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Kipok commented Mar 6, 2026

wasiahmad commented Oct 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2025 •

edited

Loading