Conversation
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds optional input templating to SFT dataset processing: PromptResponseDataset accepts an input_template_path, loads a YAML template when provided, generates a formatted_input via apply_input_template, switches the effective input_key to "formatted_input" for downstream processing, and setup_data forwards the template path. (48 words) Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Trainer as setup_data
participant PRD as PromptResponseDataset
participant DS as Dataset.map
User->>Trainer: call setup_data(..., input_template_path)
Trainer->>PRD: PromptResponseDataset(..., input_template_path)
alt input_template provided
PRD->>PRD: load_prompt_config (YAML) -> input_template
PRD->>DS: map(apply_input_template)
DS-->>PRD: examples with `formatted_input`
PRD->>PRD: set current_input_key = "formatted_input"
else no template
PRD->>PRD: keep configured input_key
end
PRD->>DS: map(add_messages_key, fn_kwargs={"input_key": current_input_key})
DS-->>PRD: dataset with messages
PRD-->>Trainer: return prepared dataset
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/training/nemo_rl/start_sft.py(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (2)
nemo_skills/training/nemo_rl/start_sft.py (2)
89-89: LGTM!The parameter addition maintains backward compatibility with an appropriate default value.
259-259: LGTM!Safe retrieval of optional configuration parameter with appropriate default.
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
nemo_skills/training/nemo_rl/start_sft.py (2)
143-143: Add descriptive assertion message.The assertion lacks an explanatory message. When it fails, users won't understand why template application is incompatible with message-formatted data.
Apply this diff:
- assert "messages" not in dataset.column_names + assert "messages" not in dataset.column_names, ( + "Cannot apply input_template to datasets that already have 'messages' format. " + "Input templates are only supported for input/output format datasets." + )
181-186: Add error handling, validation, and documentation.The method has several issues identified in past reviews that remain unaddressed:
- Undocumented multi-key feature: Line 182 splits
input_keyby comma but this isn't documented.- Missing validation: No check that keys exist in
examplesor that batch lengths are consistent.- Missing error handling: Lines 183-185 can raise
KeyError,IndexError, orValueErrorwithout context.Apply this diff to add comprehensive error handling:
def apply_input_template(self, examples: dict[str, list[Any]]) -> dict[str, list[str]]: + """Apply input template to examples, supporting comma-separated input keys. + + Args: + examples: Batched examples dict with keys specified in self.input_key + + Returns: + Dict with "formatted_input" key containing formatted strings + + Raises: + KeyError: If template references keys not in examples + ValueError: If examples are empty or template format is invalid + """ keys = [k.strip() for k in self.input_key.split(",")] + + # Validate all keys exist + missing_keys = [k for k in keys if k not in examples] + if missing_keys: + raise KeyError( + f"Template references missing keys: {missing_keys}. " + f"Available keys: {list(examples.keys())}" + ) + + # Validate non-empty + if not examples[keys[0]]: + return {"formatted_input": []} + + try: - examples["formatted_input"] = [ - self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]])) - ] + examples["formatted_input"] = [ + self.input_template.format(**{k: examples[k][i] for k in keys}) + for i in range(len(examples[keys[0]])) + ] + except (KeyError, ValueError, IndexError) as e: + raise ValueError( + f"Failed to apply template: {e}. " + f"Template: {self.input_template[:100]}..." + ) from e return examples
🧹 Nitpick comments (2)
nemo_skills/training/nemo_rl/start_sft.py (2)
103-109: LGTM! Template loading is well-implemented.The error handling for template file operations is comprehensive, including proper encoding specification and validation of the required "user" key in the YAML structure.
Optional: Consider adding a check for empty template values to catch configuration errors earlier:
if "user" not in data: raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}") self.input_template = data["user"] + if not self.input_template or not self.input_template.strip(): + raise ValueError(f"Template 'user' field is empty in: {input_template_path}")
168-179: LGTM! Critical bug fixed.The method signature change correctly addresses the critical
self.input_keymutation bug identified in past reviews by acceptinginput_keyas a parameter.Consider adding
strict=Truetozip()for data integrity.The
zip()call on line 177 should usestrict=True(Python 3.10+) to catch mismatched input/output list lengths, which could indicate data quality issues.Apply this diff:
"messages": [ [ {"role": "user", "content": input_}, {"role": "assistant", "content": output}, ] - for input_, output in zip(examples[input_key], examples[self.output_key]) + for input_, output in zip(examples[input_key], examples[self.output_key], strict=True) ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/training/nemo_rl/start_sft.py(6 hunks)
🧰 Additional context used
🪛 Ruff (0.13.2)
nemo_skills/training/nemo_rl/start_sft.py
108-108: Avoid specifying long messages outside the exception class
(TRY003)
177-177: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🔇 Additional comments (1)
nemo_skills/training/nemo_rl/start_sft.py (1)
266-266: LGTM!The propagation of
input_template_pathfrom configuration toPromptResponseDatasetis implemented correctly with appropriate default handling.
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
nemo_skills/training/nemo_rl/start_sft.py (1)
181-186: Previous review comment still applies.The concerns raised in the previous review regarding missing error handling, undocumented multi-key support (line 182 now uses semicolon separator), and lack of validation remain valid.
🧹 Nitpick comments (2)
nemo_skills/training/nemo_rl/start_sft.py (2)
103-109: Add error handling for YAML parsing failures.While the code checks for the 'user' key, it doesn't handle potential YAML parsing errors or empty template content. If the YAML file is malformed or the template is empty, users will encounter unclear errors during dataset processing.
Apply this diff to add validation:
self.input_template = None if input_template_path: - with open(input_template_path, "rt", encoding="utf-8") as fin: - data = yaml.safe_load(fin) - if "user" not in data: - raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}") - self.input_template = data["user"] + try: + with open(input_template_path, "rt", encoding="utf-8") as fin: + data = yaml.safe_load(fin) + if not data or not isinstance(data, dict): + raise ValueError(f"Template file must contain a YAML dictionary: {input_template_path}") + if "user" not in data: + raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}") + self.input_template = data["user"] + if not self.input_template or not self.input_template.strip(): + raise ValueError(f"Template 'user' field is empty: {input_template_path}") + except (FileNotFoundError, yaml.YAMLError) as e: + raise ValueError(f"Failed to load template file {input_template_path}: {e}")
168-170: LGTM: Signature change is correct.The addition of the
input_keyparameter is necessary and correctly enables dynamic input key selection when templates are used.Optional: Consider adding
strict=Trueto thezip()call at line 177 to ensure batch consistency:- for input_, output in zip(examples[input_key], examples[self.output_key]) + for input_, output in zip(examples[input_key], examples[self.output_key], strict=True)This provides an extra safeguard, though HuggingFace datasets batching already ensures equal-length lists.
Also applies to: 177-177
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/training/nemo_rl/start_sft.py(6 hunks)
🧰 Additional context used
🪛 Ruff (0.13.2)
nemo_skills/training/nemo_rl/start_sft.py
108-108: Avoid specifying long messages outside the exception class
(TRY003)
177-177: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (3)
nemo_skills/training/nemo_rl/start_sft.py (3)
25-25: LGTM!The
yamlimport is necessary for loading the template file and is used correctly at line 106.
141-149: LGTM: Critical mutation issue resolved.The introduction of the
current_input_keylocal variable correctly addresses the previous critical issue whereself.input_keywas mutated during processing, which would have caused incorrect behavior with validation datasets.Note: The assertion at line 143 still lacks a descriptive message, as flagged in a previous review comment.
266-266: LGTM!The parameter forwarding correctly passes the
input_template_pathfrom the data configuration to thePromptResponseDatasetconstructor with a safe default.
| output_key: str = "output", | ||
| num_proc: int | None = None, | ||
| force_reprocess: bool = False, | ||
| input_template_path: str | None = None, |
There was a problem hiding this comment.
also wouldn't this fail for typical cases when e.g. you have keys like problem and generation in your file, but the logic for setting defaults checks for input / output or messages? So that part should also be updated
There was a problem hiding this comment.
@Kipok I didn't understand this. When a user is passing input_template_path, their provided input_key field must match with keys present in the input_template. Why having keys like problem and generation would cause failure?
There was a problem hiding this comment.
I mean the logic in detect_data_format function
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
nemo_skills/training/nemo_rl/start_sft.py (1)
181-186: Add error handling and document multi-key support.The method can fail with cryptic errors if the template references missing keys, examples are empty, or the template syntax is invalid. Additionally, the semicolon-separated multi-key support is undocumented.
Apply this diff to add error handling and documentation:
def apply_input_template(self, examples: dict[str, list[Any]]) -> dict[str, list[str]]: + """Apply input template to examples, supporting semicolon-separated input keys. + + Args: + examples: Batched examples dict with input keys + + Returns: + Dict with "formatted_input" key containing formatted strings + + Raises: + KeyError: If template references keys not in examples + ValueError: If examples are empty or template syntax is invalid + """ keys = [k.strip() for k in self.input_key.split(";")] + + # Validate all keys exist + missing_keys = [k for k in keys if k not in examples] + if missing_keys: + raise KeyError( + f"Template references missing keys: {missing_keys}. " + f"Available keys: {list(examples.keys())}" + ) + + # Validate non-empty + if not examples or not examples[keys[0]]: + return {"formatted_input": []} + + try: - examples["formatted_input"] = [ - self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]])) - ] + examples["formatted_input"] = [ + self.input_template.format(**{k: examples[k][i] for k in keys}) + for i in range(len(examples[keys[0]])) + ] + except (KeyError, ValueError) as e: + raise ValueError( + f"Failed to apply template: {e}. Template: {self.input_template[:100]}..." + ) from e return examples
🧹 Nitpick comments (2)
nemo_skills/training/nemo_rl/start_sft.py (2)
141-149: Critical bug fixed! Consider adding assertion message for better debugging.Excellent fix! The code now correctly uses a local variable
current_input_keyinstead of mutatingself.input_key, which resolves the critical bug from previous reviews where validation datasets would fail.Optional improvement: Add a descriptive message to the assertion at line 143 for better debugging:
- assert "messages" not in dataset.column_names + assert "messages" not in dataset.column_names, ( + "Cannot apply input_template to datasets that already have 'messages' format. " + "Input templates are only supported for input/output format datasets." + )
177-177: LGTM! Consider addingstrict=Truetozip()for safety.The code correctly uses the passed
input_keyparameter.Optional improvement: Add
strict=Trueto catch length mismatches between input and output data:- for input_, output in zip(examples[input_key], examples[self.output_key]) + for input_, output in zip(examples[input_key], examples[self.output_key], strict=True)Note: This requires Python 3.10+. If targeting earlier versions, this change should be skipped.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/training/nemo_rl/start_sft.py(6 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
nemo_skills/training/nemo_rl/start_sft.py
108-108: Avoid specifying long messages outside the exception class
(TRY003)
177-177: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (5)
nemo_skills/training/nemo_rl/start_sft.py (5)
39-40: LGTM!The import correctly uses nemo-skills utilities as requested, and the alias avoids naming conflicts with the existing
load_configimport.
91-91: LGTM!The parameter is well-named and appropriately optional.
156-156: LGTM!Correctly passes the input key to
add_messages_keyviafn_kwargs, completing the fix for the validation dataset issue.
168-170: LGTM!The signature correctly accepts
input_keyas a parameter, making the method more flexible and fixing the validation dataset bug.
266-266: LGTM!The template path is correctly propagated from the data config to the dataset constructor.
| self.input_template = None | ||
| if input_template_path: | ||
| input_template_config = load_prompt_config(input_template_path) | ||
| if "user" not in input_template_config: | ||
| raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}") | ||
| self.input_template = input_template_config["user"] |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify error handling in load_prompt_config.
The template loading logic is sound, but ensure that load_prompt_config properly handles file errors (missing files, permission issues, invalid YAML) to provide clear error messages to users.
Run the following script to examine the error handling in load_prompt_config:
Note: Line 108 has a long error message in the exception. Consider extracting it to a variable for better style compliance.
- if "user" not in input_template_config:
- raise KeyError(f"'user' key is missing in the YAML file: {input_template_path}")
+ if "user" not in input_template_config:
+ msg = f"'user' key is missing in the YAML file: {input_template_path}"
+ raise KeyError(msg)🏁 Script executed:
#!/bin/bash
# Description: Check error handling in load_prompt_config function
# Find and display the load_config function implementation
ast-grep --pattern $'def load_config($$$) {
$$$
}'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Locate the implementation of load_prompt_config in the Python codebase
rg -nP --type=py "def load_prompt_config" -C3Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for load_prompt_config references across the repository
rg -n "load_prompt_config"Length of output: 237
🏁 Script executed:
#!/bin/bash
# Display the implementation of load_config in the prompt utils module
rg -n "def load_config" -C10 nemo_skills/promptLength of output: 1538
🏁 Script executed:
#!/bin/bash
# Show full implementation of load_config to inspect error handling
sed -n '350,450p' nemo_skills/prompt/utils.pyLength of output: 1814
Add error handling in load_config and extract literal messages to variables
- In nemo_skills/prompt/utils.py (load_config), wrap the
open(config_path)andyaml.safe_loadcalls in a try/except that catches FileNotFoundError, PermissionError, and yaml.YAMLError, then rethrows with a clear message includingconfig_path. - In nemo_skills/training/nemo_rl/start_sft.py (lines 108–109), extract the KeyError message into a
msgvariable before raising.
🧰 Tools
🪛 Ruff (0.13.3)
108-108: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In nemo_skills/prompt/utils.py (around the load_config function) wrap the file
open and yaml.safe_load calls in a try/except that catches FileNotFoundError,
PermissionError, and yaml.YAMLError and rethrows a new exception (or raise) with
a clear message that includes the config_path; ensure the original exception is
chained. In nemo_skills/training/nemo_rl/start_sft.py around lines 104–109,
extract the KeyError message into a local variable msg (e.g. msg = f"...") and
raise KeyError(msg) instead of inlining the formatted string in the raise
statement.
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Kipok
left a comment
There was a problem hiding this comment.
added a few more comments. If you want to merge this please also add a new gpu test in https://github.com/NVIDIA-NeMo/Skills/blob/main/tests/gpu-tests/test_train.py to ensure this is being tested in the ci
| output_key: str = "output", | ||
| num_proc: int | None = None, | ||
| force_reprocess: bool = False, | ||
| input_template_path: str | None = None, |
There was a problem hiding this comment.
I mean the logic in detect_data_format function
| print(f"[Map] Processing {split_name} dataset from: {path}") | ||
| dataset = load_dataset("json", data_files=str(path))["train"] | ||
|
|
||
| current_input_key = self.input_key |
There was a problem hiding this comment.
@wasiahmad this is an important thing to fix if you want to merge this
|
@wasiahmad do you still need this feature? If not, maybe we close this without merging for now as some comments are still unaddressed here |
| examples["formatted_input"] = [ | ||
| self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]])) |
There was a problem hiding this comment.
accessing examples[k][i] will fail if any key k doesn't exist in the dataset
| examples["formatted_input"] = [ | |
| self.input_template.format(**{k: examples[k][i] for k in keys}) for i in range(len(examples[keys[0]])) | |
| formatted_inputs = [] | |
| for i in range(len(examples[keys[0]])): | |
| format_dict = {k: examples[k][i] for k in keys} | |
| formatted_inputs.append(self.input_template.format(**format_dict)) | |
| examples["formatted_input"] = formatted_inputs |
Additional Comments (2)
Store |
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_skills/evaluation/evaluator/audio.py (1)
376-386:⚠️ Potential issue | 🟡 MinorReturn a stable ASR-PC metric schema for
missing_generation.At Line 376 through Line 386, the ASR-PC early return only includes
wer, while the normal ASR-PC path returnswer,wer_c,wer_pc, andper. This can break downstream consumers expecting ASR-PC keys.Suggested fix
- # ASR / ASR-PC - return {**base, "wer": 1.0} + if task_type == "ASR-PC": + return {**base, "wer": 1.0, "wer_c": 1.0, "wer_pc": 1.0, "per": 1.0} + return {**base, "wer": 1.0}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/evaluation/evaluator/audio.py` around lines 376 - 386, The early-return for missing_generation when task_type is "ASR-PC" only returns "wer" but downstream expects the full ASR-PC metric schema; update the branch in the block that checks task_type and generation (the if task_type in [...] and not generation) so that when task_type == "ASR-PC" it returns the complete set of keys used by the normal ASR-PC path (e.g., wer, wer_c, wer_pc, per) with appropriate default values (e.g., 1.0 for error rates) by merging them into the existing base dict instead of returning only wer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/evaluation/evaluator/audio.py`:
- Around line 166-167: The code currently always applies Whisper normalization
by calling preprocess_asr_text(reference) and preprocess_asr_text(hypothesis)
and calls evaluate_asr(...) without honoring config.apply_whisper_normalization
or config.normalization_mode; change the flow so that before preprocessing (or
before calling evaluate_asr) you check config.apply_whisper_normalization and
config.normalization_mode: if apply_whisper_normalization is True and
normalization_mode == "whisper" call the Whisper-specific normalization routine
(or call preprocess_asr_text with a mode parameter), if
apply_whisper_normalization is False skip Whisper normalization and use the
standard text preprocessing, and if normalization_mode is set to an unsupported
value raise an explicit error; also update the evaluate_asr(...) call site to
pass or respect these config flags rather than ignoring them so user settings
are enforced (refer to preprocess_asr_text, evaluate_asr,
config.apply_whisper_normalization, and config.normalization_mode).
---
Outside diff comments:
In `@nemo_skills/evaluation/evaluator/audio.py`:
- Around line 376-386: The early-return for missing_generation when task_type is
"ASR-PC" only returns "wer" but downstream expects the full ASR-PC metric
schema; update the branch in the block that checks task_type and generation (the
if task_type in [...] and not generation) so that when task_type == "ASR-PC" it
returns the complete set of keys used by the normal ASR-PC path (e.g., wer,
wer_c, wer_pc, per) with appropriate default values (e.g., 1.0 for error rates)
by merging them into the existing base dict instead of returning only wer.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_skills/dataset/asr-leaderboard/__init__.pynemo_skills/evaluation/evaluator/audio.py
| ref_std = preprocess_asr_text(reference) | ||
| hyp_std = preprocess_asr_text(hypothesis) |
There was a problem hiding this comment.
apply_whisper_normalization / normalization_mode are effectively ignored in the new flow.
At Line 166 and Line 167, ASR-PC standard WER always uses Whisper normalization when enabled, regardless of normalization_mode. At Line 399, ASR always calls evaluate_asr(...) without honoring config.apply_whisper_normalization. This silently ignores user-provided settings.
Suggested fix (make config effective and fail on unsupported modes)
def evaluate_asr_pc(
reference: str, hypothesis: str, normalize_standard_wer: bool = True, normalization_mode: str = "standard"
) -> dict[str, Any]:
@@
- if normalize_standard_wer:
- ref_std = preprocess_asr_text(reference)
- hyp_std = preprocess_asr_text(hypothesis)
+ if normalize_standard_wer:
+ if normalization_mode == "standard":
+ ref_std = preprocess_asr_text(reference)
+ hyp_std = preprocess_asr_text(hypothesis)
+ elif normalization_mode == "none":
+ ref_std = normalize_whitespace(re.sub(r"[^\w\s]", "", reference.lower()))
+ hyp_std = normalize_whitespace(re.sub(r"[^\w\s]", "", hypothesis.lower()))
+ else:
+ raise ValueError(f"Unsupported normalization_mode: {normalization_mode}")
else:
ref_std = normalize_whitespace(re.sub(r"[^\w\s]", "", reference.lower()))
hyp_std = normalize_whitespace(re.sub(r"[^\w\s]", "", hypothesis.lower()))-def evaluate_asr(reference: str, hypothesis: str) -> dict[str, Any]:
+def evaluate_asr(reference: str, hypothesis: str, apply_whisper_normalization: bool = True) -> dict[str, Any]:
@@
- ref = preprocess_asr_text(reference)
- hyp = preprocess_asr_text(hypothesis)
+ if apply_whisper_normalization:
+ ref = preprocess_asr_text(reference)
+ hyp = preprocess_asr_text(hypothesis)
+ else:
+ ref = normalize_whitespace(re.sub(r"[^\w\s]", "", reference.lower()))
+ hyp = normalize_whitespace(re.sub(r"[^\w\s]", "", hypothesis.lower()))- elif task_type == "ASR":
- metrics = evaluate_asr(expected_answer, generation)
+ elif task_type == "ASR":
+ metrics = evaluate_asr(
+ expected_answer,
+ generation,
+ apply_whisper_normalization=config.apply_whisper_normalization,
+ )
updates.update(metrics)As per coding guidelines "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing. Use dataclass or **kwargs syntax to handle this automatically".
Also applies to: 399-399
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/evaluation/evaluator/audio.py` around lines 166 - 167, The code
currently always applies Whisper normalization by calling
preprocess_asr_text(reference) and preprocess_asr_text(hypothesis) and calls
evaluate_asr(...) without honoring config.apply_whisper_normalization or
config.normalization_mode; change the flow so that before preprocessing (or
before calling evaluate_asr) you check config.apply_whisper_normalization and
config.normalization_mode: if apply_whisper_normalization is True and
normalization_mode == "whisper" call the Whisper-specific normalization routine
(or call preprocess_asr_text with a mode parameter), if
apply_whisper_normalization is False skip Whisper normalization and use the
standard text preprocessing, and if normalization_mode is set to an unsupported
value raise an explicit error; also update the evaluate_asr(...) call site to
pass or respect these config flags rather than ignoring them so user settings
are enforced (refer to preprocess_asr_text, evaluate_asr,
config.apply_whisper_normalization, and config.normalization_mode).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/dataset/asr-leaderboard/__init__.py`:
- Around line 17-18: The evaluator currently uses sample.get("task_type",
"unknown") which silently skips WER computation; update evaluate_sample in
nemo_skills/evaluation/evaluator/audio.py to access sample["task_type"] directly
and add explicit validation (raise a clear exception) for allowed values (e.g.,
"ASR", "ASR-PC", "ASR_LEADERBOARD") before any metric computation so
missing/invalid task_type fails fast instead of falling through to the else
branch that omits WER.
| # Uses the audio evaluator which computes WER with Whisper-style text normalization | ||
| # Data samples should have task_type="ASR" for proper WER calculation |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate task_type handling
echo "=== Search 1: task_type handling ==="
rg -n -C3 --type=py '\btask_type\b' 2>/dev/null | head -100
# 2) Locate audio evaluation / WER entry points
echo ""
echo "=== Search 2: Audio eval / WER entry points ==="
rg -n -C3 --type=py 'WER|word.?error|whisper|eval_type\s*=\s*audio|METRICS_TYPE\s*=\s*"audio"' 2>/dev/null | head -100
# 3) Look specifically for ASR guards/assertions
echo ""
echo "=== Search 3: ASR guards/assertions ==="
rg -n -C3 --type=py 'assert .*task_type|if .*task_type.*ASR|task_type\s*==\s*["'"'"']ASR["'"'"']' 2>/dev/null | head -100Repository: NVIDIA-NeMo/Skills
Length of output: 18378
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Get the complete evaluate_sample function
echo "=== Full evaluate_sample function ==="
rg -A 70 'def evaluate_sample\(sample: dict' nemo_skills/evaluation/evaluator/audio.py
# Check if there's any else/default handling for unmatched task_type
echo ""
echo "=== Check for else/default handling ==="
rg -A 80 'elif task_type == "PC-Rate"' nemo_skills/evaluation/evaluator/audio.pyRepository: NVIDIA-NeMo/Skills
Length of output: 4129
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where evaluate_sample is called
echo "=== Callers of evaluate_sample ==="
rg -B5 -A3 'evaluate_sample\(' nemo_skills/evaluation/evaluator/audio.py | head -60
# Check if there's any validation of task_type before calling evaluate_sample
echo ""
echo "=== Check for task_type validation before evaluate_sample call ==="
rg -B10 'evaluate_sample' nemo_skills/evaluation/evaluator/audio.py | grep -A 10 'def 'Repository: NVIDIA-NeMo/Skills
Length of output: 1613
Enforce task_type="ASR" validation in the evaluator, not only in comments.
The evaluate_sample function at nemo_skills/evaluation/evaluator/audio.py:472 uses .get("task_type", "unknown") with a silent default. If task_type is missing or doesn't match expected values (ASR, ASR-PC, ASR_LEADERBOARD, etc.), the code silently falls through to the else clause (lines 528–531), which skips WER computation and returns minimal fields. This contradicts the documented requirement in nemo_skills/dataset/asr-leaderboard/__init__.py:17-18 that data samples should have task_type="ASR" for proper WER calculation.
Use direct access sample["task_type"] instead of .get() and add explicit validation before metric computation to fail fast when task_type is missing or invalid, preventing silent metric loss.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/dataset/asr-leaderboard/__init__.py` around lines 17 - 18, The
evaluator currently uses sample.get("task_type", "unknown") which silently skips
WER computation; update evaluate_sample in
nemo_skills/evaluation/evaluator/audio.py to access sample["task_type"] directly
and add explicit validation (raise a clear exception) for allowed values (e.g.,
"ASR", "ASR-PC", "ASR_LEADERBOARD") before any metric computation so
missing/invalid task_type fails fast instead of falling through to the else
branch that omits WER.
|
probably won't merge this @wasiahmad as we are trying to upstream everything into gym / nemo-rl. If you want to keep this functionality please create a pr into nemo-rl directly (probably not using our prompt template, but you can introduce similar logic there). In the future, we will try to always use a built-in script and if we need changes, they'd need to go to nemo-rl directly to avoid divergence |
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Note
Adds optional YAML-driven input templating to combine multiple fields into a formatted_input used for message construction, configurable via data.input_template_path.
load_prompt_configand renderformatted_inputfrom multiple fields (semicolon-separatedinput_key). Requiresuserkey in template.formatted_inputwhen provided;add_messages_keyupdated to acceptinput_keyparameter.apply_input_templatemap step applied before message creation; asserts no pre-existingmessages.data.input_template_pathaccepted and passed into dataset setup.Written by Cursor Bugbot for commit 2353efe. This will update automatically on new commits. Configure here.