HF ASR Leaderboard Fix by melllinia · Pull Request #1140 · NVIDIA-NeMo/Skills

melllinia · 2025-12-23T12:28:44Z

Summary by CodeRabbit

New Features
- Enhanced audio preprocessing with automatic filtering for invalid samples and non-speech content.
- Expanded evaluation output to include normalized text fields for improved result transparency.
Improvements
- Standardized ASR evaluation with consistent Whisper-based text normalization across all evaluations.
- Centralized audio evaluation dependencies in a requirements file for easier management.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-23T12:32:34Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The PR refactors the ASR evaluation pipeline by centralizing package dependencies into a requirements file, separating evaluation and generation arguments, enhancing audio preprocessing with filtering and metadata exposure, and restructuring the audio evaluator to use lazy-loaded normalization with unified return payloads.

Changes

Cohort / File(s)	Summary
Configuration & Constants `nemo_skills/dataset/asr-leaderboard/__init__.py`	Splits `GENERATION_ARGS` into separate `EVAL_ARGS` ("++eval_type=audio") and reduces `GENERATION_ARGS` to "++prompt_format=openai"; updates comments to reflect Whisper-style normalization and ASR task type.
Dataset Preparation `nemo_skills/dataset/asr-leaderboard/prepare.py`	Adds preprocessing constants (USER_MESSAGE, SKIP_SPEAKER_IDS, NONSPEECH_TOKENS), introduces `is_nonspeech_only()` helper, validates and computes audio duration, refines filtering logic to skip short/non-speech samples, exposes audio_filepath and duration as top-level fields, and changes task_type from ASR_LEADERBOARD to ASR.
Audio Evaluation `nemo_skills/evaluation/evaluator/audio.py`	Removes `apply_whisper_normalization` config field; introduces `_get_english_normalizer()` with LRU caching for lazy normalization initialization; changes `evaluate_asr()` signature to remove optional normalization parameter; removes `evaluate_asr_leaderboard()` function; updates return payloads across all evaluation functions to include normalized text/pred_text fields; consolidates ASR/ASR-PC task handling in `evaluate_sample()`.
Documentation `docs/evaluation/speech-audio.md`	Replaces explicit pip package installation commands with centralized requirements file reference ("pip install -r requirements/audio.txt").
Dependencies & Imports `requirements/audio.txt`, `nemo_skills/evaluation/evaluator/compute_eval.py`	Adds new audio requirements file with jiwer, sacrebleu, soundfile, and whisper-normalizer; updates BaseEvaluator import path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

HF ASR Leaderboard Evaluation #1104: Modifies the same asr-leaderboard module constants (GENERATION_ARGS/EVAL_ARGS split) and dataset prepare logic with overlapping preprocessing and audio handling changes.
Audiometrics unification #1093: Refactors the same audio evaluator module (nemo_skills/evaluation/evaluator/audio.py) where this PR further restructures normalization, function signatures, and return payloads.
Moving evaluation inside generation class and enforcing empty generations when remove_thinking=True #958: Adjusts dataset-level eval/generation argument constants (EVAL_ARGS and GENERATION_ARGS) that are also modified in this PR's asr-leaderboard configuration.

Suggested reviewers

gwarmstrong

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'HF ASR Leaderboard Fix' is vague and does not clearly convey the specific changes made in this PR, such as refactoring evaluation logic, centralizing dependencies, or restructuring configuration.	Consider a more descriptive title that captures the main change, such as 'Refactor ASR evaluation to use Whisper normalization and centralize dependencies' or 'Consolidate ASR config and evaluation logic.'

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

nemo_skills/evaluation/evaluator/audio.py (1)
183-190: Consider catching more specific exceptions.

Catching bare Exception can mask unexpected errors and make debugging difficult. Consider catching specific exceptions that sacrebleu might raise (e.g., ValueError, TypeError) or at minimum log the full traceback before returning the error dict.
♻️ Suggested improvement
-    except Exception as e:
+    except (ValueError, TypeError, AttributeError) as e:
+        LOG.warning(f"Translation evaluation failed: {e}")
         return {
             "bleu": 0.0,
             "is_correct": False,
             "error": str(e),
             "text": reference.strip(),
             "pred_text": hypothesis.strip(),
         }

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 448e97b and b262889.

📒 Files selected for processing (6)

docs/evaluation/speech-audio.md
nemo_skills/dataset/asr-leaderboard/__init__.py
nemo_skills/dataset/asr-leaderboard/prepare.py
nemo_skills/evaluation/evaluator/audio.py
nemo_skills/evaluation/evaluator/compute_eval.py
requirements/audio.txt

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-23T17:56:57.556Z

Learnt from: Glorf
Repo: NVIDIA-NeMo/Skills PR: 908
File: requirements/main.txt:16-16
Timestamp: 2025-11-23T17:56:57.556Z
Learning: faiss-cpu must be explicitly listed in requirements/main.txt for BFCLv4 memory evaluations (memory_kv, memory_vector, memory_rec_sum) as it is an optional dependency of sentence_transformers that is required for vector similarity search functionality in the memory backends.

Applied to files:

docs/evaluation/speech-audio.md
requirements/audio.txt

📚 Learning: 2025-12-12T16:09:53.870Z

Learnt from: Jorjeous
Repo: NVIDIA-NeMo/Skills PR: 1103
File: nemo_skills/prompt/config/judge/audiobench.yaml:15-28
Timestamp: 2025-12-12T16:09:53.870Z
Learning: In AudioBench judge prompt configuration (nemo_skills/prompt/config/judge/audiobench.yaml), having duplicate Score0 entries is intentional - one for "refusing to give concrete results" and another for "completely misaligned" answers. These should remain as separate entries rather than being combined.

Applied to files:

nemo_skills/dataset/asr-leaderboard/__init__.py

🧬 Code graph analysis (2)

nemo_skills/evaluation/evaluator/compute_eval.py (1)

nemo_skills/evaluation/evaluator/base.py (1)

BaseEvaluator (34-91)

nemo_skills/evaluation/evaluator/audio.py (1)

nemo_skills/evaluation/metrics/audio_metrics.py (1)

update (177-228)

🪛 Ruff (0.14.11)

nemo_skills/evaluation/evaluator/audio.py

177-182: Consider moving this statement to an else block

(TRY300)

183-183: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (12)

nemo_skills/evaluation/evaluator/compute_eval.py (1)

23-23: LGTM!

The import path update correctly references BaseEvaluator from its new location in nemo_skills.evaluation.evaluator.base, aligning with the module restructuring in this PR.

docs/evaluation/speech-audio.md (1)

78-78: LGTM!

The documentation correctly updates installation commands to use the centralized requirements/audio.txt file. The CLI example now has proper line continuation, and the change aligns with the new dependency management approach for ASR evaluation.

Also applies to: 101-102

nemo_skills/dataset/asr-leaderboard/__init__.py (1)

16-22: LGTM!

The separation of EVAL_ARGS from GENERATION_ARGS improves clarity by isolating evaluation-specific configuration. The comments accurately describe the Whisper-style normalization and the expected task_type="ASR" for proper WER calculation.

nemo_skills/dataset/asr-leaderboard/prepare.py (3)

35-48: LGTM!

The new constants and is_nonspeech_only() helper are well-designed. The function correctly handles edge cases: empty strings return False (since tokens will be empty and fail the truthiness check), and only returns True when all tokens are non-speech markers.

82-90: Good defensive coding.

Validating that the audio array is non-empty before processing prevents potential runtime errors from malformed dataset entries.

161-171: LGTM!

The filtering logic correctly excludes problematic samples: empty transcripts, skipped speaker IDs (e.g., inter_segment_gap in TedLium), and non-speech-only samples from GigaSpeech. The skip message at line 171 accurately reflects all the reasons for exclusion.

nemo_skills/evaluation/evaluator/audio.py (6)

122-124: LGTM!

The addition of text and pred_text fields provides useful traceability of the normalized reference and hypothesis used for WER calculation.

127-137: Good use of @lru_cache for lazy initialization.

This pattern correctly addresses the reviewer's suggestion to avoid global variables while maintaining lazy loading of the normalizer.

One minor note: the docstring claims specific operations ("lowercase, remove brackets, normalize whitespace"), but the actual behavior depends entirely on EnglishTextNormalizer. Based on past review comments, this has been verified to produce correct results.

201-203: LGTM!

Using raw reference/hypothesis for CER is appropriate since character-level evaluation should preserve the original text.

220-221: LGTM!

Consistent addition of text and pred_text fields to both success and error return paths.

Also applies to: 235-236

284-286: LGTM!

Using raw text for PC-Rate metrics is correct since the evaluation depends on original punctuation and capitalization.

317-337: Clean simplification of ASR task handling.

The removal of separate ASR-Leaderboard handling in favor of unified ASR evaluation with consistent Whisper normalization is a good architectural improvement. This aligns with the PR's objective to fix HF ASR Leaderboard evaluation.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7205c43 and 6baa39d.

📒 Files selected for processing (9)

README.md
docs/evaluation/index.md
docs/evaluation/speech-audio.md
docs/index.md
nemo_skills/dataset/asr-leaderboard/__init__.py
nemo_skills/dataset/asr-leaderboard/prepare.py
nemo_skills/evaluation/evaluator/audio.py
nemo_skills/pipeline/prepare_data.py
tests/gpu-tests/test_eval.py

🧰 Additional context used

🪛 LanguageTool

docs/evaluation/speech-audio.md

[style] ~37-~37: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 2899 characters long)
Context: ...default** to ensure proper evaluation. !!! warning "Running without audio files" ...

(EN_EXCESSIVE_EXCLAMATION)

🪛 Ruff (0.14.10)

nemo_skills/dataset/asr-leaderboard/prepare.py

115-115: Avoid specifying long messages outside the exception class

(TRY003)

125-125: Do not catch blind exception: Exception

(BLE001)

171-171: Consider [*list(DATASET_CONFIGS.keys()), "all"] instead of concatenation

Replace with [*list(DATASET_CONFIGS.keys()), "all"]

(RUF005)

🔇 Additional comments (11)

tests/gpu-tests/test_eval.py (1)

46-46: LGTM!

Appropriately excludes the new asr-leaderboard dataset from automated test coverage, consistent with the pattern for other datasets requiring explicit parameters or heavy preparation.

docs/index.md (1)

25-25: LGTM!

Documentation correctly reflects the addition of asr-leaderboard as a new Speech & Audio benchmark alongside mmau-pro.

README.md (1)

22-22: LGTM!

Documentation update is consistent with the corresponding changes in docs/index.md.

docs/evaluation/index.md (1)

13-13: LGTM!

Documentation update is consistent with the broader changes introducing asr-leaderboard support.

nemo_skills/dataset/asr-leaderboard/__init__.py (1)

19-21: LGTM!

Configuration constants follow the standard pattern used by other datasets in the codebase. The settings appropriately specify the dataset group, metrics type, and default generation arguments for ASR Leaderboard evaluation.

nemo_skills/dataset/asr-leaderboard/prepare.py (3)

112-162: LGTM!

The dataset preparation logic is well-structured with appropriate error handling, filtering for invalid content (short audio, non-speech, specific speaker IDs), and clear logging. The bare exception catch at Line 125 is acceptable for gracefully handling dataset loading failures.

165-218: LGTM!

The CLI implementation correctly handles dataset selection, audio saving options, and combines individual dataset JSONL files into a unified test.jsonl for evaluation.

63-109: Path format is consistent with documentation; verification recommended for deployment environment only.

The hardcoded audio path at line 93 matches the module docstring (line 18) and is correctly designed for the /dataset mount point documented in speech-audio.md. However, note that the ASR_LEADERBOARD evaluator only compares text transcriptions and does not load audio from this path. If the inference model needs to load audio files, ensure the deployment environment mounts data at /dataset/asr-leaderboard/data/ as specified.

nemo_skills/pipeline/prepare_data.py (1)

34-34: LGTM!

Appropriately adds asr-leaderboard to the list of datasets requiring a data directory, consistent with its audio file handling requirements and the pattern used for other large datasets like mmau-pro.

docs/evaluation/speech-audio.md (2)

10-19: No issues found—dataset configuration aligns with implementation. All 8 datasets (librispeech_clean, librispeech_other, voxpopuli, tedlium, gigaspeech, spgispeech, earnings22, ami) are correctly defined in nemo_skills/dataset/asr-leaderboard/prepare.py with matching names and order. The __init__.py file is properly configured with audio metrics and WER evaluation as documented.

1-256: All referenced implementation files verified as existing and correctly configured.

The documentation accurately references:

nemo_skills/dataset/asr-leaderboard/__init__.py and prepare.py exist and define the 8 datasets (librispeech_clean, librispeech_other, voxpopuli, tedlium, gigaspeech, spgispeech, earnings22, ami) exactly as documented

nemo_skills/evaluation/evaluator/audio.py exists

nemo_skills/pipeline/prepare_data.py exists

MMAU-Pro structure correctly implements the three benchmark categories (closed_form, open_ended, instruction_following) as documented

docs/evaluation/speech-audio.md

nemo_skills/evaluation/evaluator/audio.py

greptile-apps · 2026-01-12T17:09:41Z

Greptile Summary

This PR refactors the ASR Leaderboard evaluation to use whisper-normalizer instead of openai-whisper, unifying normalization across audio tasks. The main changes include removing the deprecated ASR_LEADERBOARD task type in favor of standard ASR, enhancing dataset preparation with better filtering logic for non-speech tokens and invalid audio, and centralizing audio dependencies in requirements/audio.txt.

Key improvements:

Switched from openai-whisper to lightweight whisper-normalizer package for text normalization
Enhanced dataset preparation with filtering for speaker IDs, non-speech tokens (<SIL>, <MUSIC>, <NOISE>, <OTHER>), and empty audio arrays
Added text and pred_text fields to all audio evaluation results for better transparency
Fixed circular import issue in compute_eval.py by updating import path
Updated documentation with corrected bash syntax and centralized dependency installation

Most previously identified issues have been addressed or acknowledged by the development team.

Confidence Score: 4/5

This PR is safe to merge with minimal risk
The refactoring is well-structured with appropriate filtering logic and error handling. Previous review threads show that critical concerns (dependency compatibility, normalization behavior) have been verified by the team. The main uncertainty is around the whisper-normalizer package version pinning, but this is a minor maintainability concern rather than a blocker
Pay attention to requirements/audio.txt for future version pinning considerations

Important Files Changed

Filename	Overview
nemo_skills/evaluation/evaluator/audio.py	Refactored ASR evaluation to use `whisper-normalizer` instead of `openai-whisper`, removed `ASR_LEADERBOARD` task type, added `text`/`pred_text` fields to all evaluation results
nemo_skills/dataset/asr-leaderboard/prepare.py	Enhanced dataset preparation with non-speech token filtering, speaker ID filtering, empty audio validation, and updated task type from `ASR_LEADERBOARD` to `ASR`
requirements/audio.txt	New file centralizing audio evaluation dependencies: `jiwer`, `sacrebleu`, `soundfile`, and `whisper-normalizer`

Sequence Diagram

sequenceDiagram
    participant User
    participant PrepareScript as prepare.py
    participant HFDataset as HuggingFace Dataset
    participant AudioEval as audio.py
    participant WhisperNorm as whisper-normalizer
    participant Jiwer as jiwer

    User->>PrepareScript: Run dataset preparation
    PrepareScript->>HFDataset: Load dataset (librispeech, voxpopuli, etc.)
    HFDataset-->>PrepareScript: Return audio + transcription entries
    
    loop For each entry
        PrepareScript->>PrepareScript: Check audio duration > 0.1s
        PrepareScript->>PrepareScript: Filter speaker_id in SKIP_SPEAKER_IDS
        PrepareScript->>PrepareScript: Filter non-speech tokens (SIL, MUSIC, etc.)
        PrepareScript->>PrepareScript: Validate audio array not empty
        PrepareScript->>PrepareScript: Save audio file + create JSONL entry
        Note over PrepareScript: task_type="ASR" (was ASR_LEADERBOARD)
    end
    
    User->>AudioEval: Evaluate ASR predictions
    AudioEval->>AudioEval: Check task_type="ASR"
    AudioEval->>AudioEval: evaluate_asr(reference, hypothesis)
    AudioEval->>WhisperNorm: preprocess_asr_text(reference)
    WhisperNorm-->>AudioEval: Normalized reference text
    AudioEval->>WhisperNorm: preprocess_asr_text(hypothesis)
    WhisperNorm-->>AudioEval: Normalized hypothesis text
    AudioEval->>AudioEval: Replace empty strings with "empty"
    AudioEval->>Jiwer: Calculate WER
    Jiwer-->>AudioEval: WER score
    AudioEval-->>User: Return {wer, is_correct, text, pred_text}

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T17:09:52Z

Additional Comments (2)

docs/evaluation/speech-audio.md
[P2] Outdated installation command. This still references the old package list instead of using pip install -r requirements/audio.txt like the Python example above (line 78).

nemo_skills/dataset/asr-leaderboard/__init__.py
[P1] Comment references removed task type. This comment mentions task_type="ASR_LEADERBOARD" but the code was changed to use task_type="ASR" (see prepare.py line 105). This will cause confusion since the comment describes behavior that no longer matches the code.

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

requirements/audio.txt

melllinia · 2026-01-12T17:21:25Z

@Jorjeous can you please check the consistency between #1157 and current changes?

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

docs/evaluation/speech-audio.md

nemo_skills/dataset/asr-leaderboard/__init__.py

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/evaluation/evaluator/audio.py

nemo_skills/dataset/asr-leaderboard/__init__.py

greptile-apps · 2026-01-12T17:26:38Z

Additional Comments (1)

nemo_skills/dataset/asr-leaderboard/prepare.py
The sample_id from the dataset is used directly in the filename without sanitization, which could potentially allow path traversal if a dataset contains malicious IDs (e.g., "../../malicious"). While Python's Path operator / provides some protection, it's better to sanitize the ID explicitly to ensure it doesn't contain path separators.

Consider sanitizing the sample_id:

        sample_id = entry.get("id", str(sample_idx))
        # Sanitize sample_id to prevent path traversal
        sample_id = str(sample_id).replace("/", "_").replace("\\", "_").replace("..", "_")
        audio_filename = f"{sample_id}.flac"

greptile-apps

_{5 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

docs/evaluation/speech-audio.md

nemo_skills/dataset/asr-leaderboard/__init__.py

nemo_skills/evaluation/evaluator/audio.py

requirements/audio.txt

greptile-apps · 2026-01-12T17:30:52Z

nemo_skills/evaluation/evaluator/audio.py

+    # Store normalized texts before empty substitution
+    text = ref
+    pred_text = hyp


Potential issue with storing empty normalized text. The code stores the normalized text before replacing empty strings with "empty", which means text and pred_text could be empty strings in the returned dictionary.

This could cause issues downstream if code expects non-empty strings. The pattern is inconsistent with other evaluation functions in this file:

evaluate_asr_pc() (line 122-123) stores the normalized text BEFORE empty substitution

evaluate_translation() (line 171-172) stores the original .strip() text

evaluate_cer() (line 201-202) stores the original text without any empty handling

The inconsistency suggests this might not be intentional. If empty strings in text/pred_text fields are acceptable, this is fine. Otherwise, consider storing after the empty substitution or storing the original text like other functions do.

fixed

Jorjeous

version of jiwer
other LGTM or will be overwritten lated

requirements/audio.txt

nemo_skills/evaluation/evaluator/audio.py

nemo_skills/dataset/asr-leaderboard/prepare.py

nemo_skills/dataset/asr-leaderboard/__init__.py

docs/evaluation/speech-audio.md

nemo_skills/dataset/asr-leaderboard/__init__.py

nemo_skills/evaluation/evaluator/audio.py

greptile-apps

_{5 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-13T16:58:19Z

nemo_skills/evaluation/evaluator/audio.py

+@lru_cache(maxsize=1)
+def _get_english_normalizer():
+    """Lazily initialize and cache the English text normalizer."""
+    from whisper_normalizer.english import EnglishTextNormalizer

+    return EnglishTextNormalizer()


import inside function with lru_cache could cause issues. if import fails after first call, the cached normalizer becomes stale. move import to module level or inside the returned function.

that's problem to be solved by user

requirements/audio.txt

greptile-apps · 2026-01-13T16:58:24Z

nemo_skills/dataset/asr-leaderboard/prepare.py

+            speaker_id = entry.get("speaker_id", "")
+            expected = formatted["expected_answer"]
+            if expected and speaker_id not in SKIP_SPEAKER_IDS and not is_nonspeech_only(expected):


logic checks expected for truthiness but is_nonspeech_only() already handles empty strings correctly (returns False for empty). the expected check is redundant for the is_nonspeech_only() branch.

greptile-apps · 2026-01-13T16:58:27Z

Additional Comments (2)

nemo_skills/dataset/asr-leaderboard/prepare.py
sample_id from untrusted dataset could contain path traversal characters like ../ or /. sanitize to prevent potential path issues.

        sample_id = entry.get("id", str(sample_idx))
        # Sanitize to prevent path issues
        sample_id = str(sample_id).replace("/", "_").replace("\\", "_")
        audio_filename = f"{sample_id}.flac"

nemo_skills/evaluation/evaluator/audio.py
preprocess_asr_text is called here before it's defined (defined on line 135). works in python due to runtime resolution, but affects code readability.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Jorjeous

Ok

greptile-apps

_{6 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/evaluation/evaluator/__init__.py

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

Jorjeous

LGTM. lets wait for tests and merge

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

greptile-apps · 2026-01-15T12:19:46Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

Jorjeous · 2026-01-15T12:25:33Z

@coderabbitai help

coderabbitai · 2026-01-15T12:25:38Z

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
- @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
- @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve to resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

Visit our status page to check the current availability of CodeRabbit.
Create a ticket on our support page for assistance with any issues or questions.
Visit our documentation site for detailed information on how to use CodeRabbit.
Join our Discord community to connect with other users and get help from the community.
Follow us on X/Twitter for updates and announcements.

Jorjeous · 2026-01-15T12:25:58Z

@coderabbitai full review

coderabbitai · 2026-01-15T12:26:05Z

✅ Actions performed

Full review triggered.

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

melllinia marked this pull request as draft December 23, 2025 12:29

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

docs/evaluation/speech-audio.md Show resolved Hide resolved

nemo_skills/evaluation/evaluator/audio.py Outdated Show resolved Hide resolved

melllinia force-pushed the hf-asr-leaderboard branch 2 times, most recently from b41e2db to 2db4594 Compare December 23, 2025 12:34

gwarmstrong previously requested changes Jan 7, 2026

View reviewed changes

nemo_skills/evaluation/evaluator/audio.py Outdated Show resolved Hide resolved

nemo_skills/evaluation/evaluator/audio.py Outdated Show resolved Hide resolved

melllinia force-pushed the hf-asr-leaderboard branch from 2db4594 to d497d8b Compare January 12, 2026 17:05

melllinia marked this pull request as ready for review January 12, 2026 17:07

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

melllinia force-pushed the hf-asr-leaderboard branch from d497d8b to f42dd55 Compare January 12, 2026 17:13

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

requirements/audio.txt Outdated Show resolved Hide resolved

melllinia force-pushed the hf-asr-leaderboard branch 2 times, most recently from aa825fe to b49281b Compare January 12, 2026 17:20

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

docs/evaluation/speech-audio.md Outdated Show resolved Hide resolved

docs/evaluation/speech-audio.md Outdated Show resolved Hide resolved

nemo_skills/dataset/asr-leaderboard/__init__.py Outdated Show resolved Hide resolved

melllinia force-pushed the hf-asr-leaderboard branch from b49281b to 2d67aa4 Compare January 12, 2026 17:25

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

nemo_skills/evaluation/evaluator/audio.py Show resolved Hide resolved

nemo_skills/dataset/asr-leaderboard/__init__.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

melllinia added the run GPU tests label Jan 12, 2026

melllinia requested review from Jorjeous and gwarmstrong January 13, 2026 10:23

Jorjeous requested changes Jan 13, 2026

View reviewed changes

Jorjeous mentioned this pull request Jan 13, 2026

Audio input output integration #1157

Merged

melllinia force-pushed the hf-asr-leaderboard branch from 2d67aa4 to 722e79c Compare January 13, 2026 16:55

greptile-apps bot reviewed Jan 13, 2026

View reviewed changes

melllinia requested a review from Jorjeous January 13, 2026 17:04

Jorjeous approved these changes Jan 14, 2026

View reviewed changes

Jorjeous added run GPU tests and removed run GPU tests labels Jan 14, 2026

greptile-apps bot reviewed Jan 14, 2026

View reviewed changes

nemo_skills/evaluation/evaluator/__init__.py Outdated Show resolved Hide resolved

melllinia added 3 commits January 15, 2026 16:15

HF ASR leaderboard prep and eval fix

1081b38

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

asr leaderboard eval enhancement

f9d6009

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

adding normalized reference and prediction to the output

0214e4a

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia force-pushed the hf-asr-leaderboard branch from 7285847 to 98ffac2 Compare January 15, 2026 12:15

Jorjeous approved these changes Jan 15, 2026

View reviewed changes

Jorjeous enabled auto-merge (squash) January 15, 2026 12:17

computeeval

b262889

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

melllinia force-pushed the hf-asr-leaderboard branch from 98ffac2 to b262889 Compare January 15, 2026 12:17

Jorjeous disabled auto-merge January 15, 2026 12:17

Jorjeous enabled auto-merge (squash) January 15, 2026 12:24

Jorjeous disabled auto-merge January 15, 2026 12:25

Jorjeous merged commit a2571d0 into main Jan 15, 2026
6 checks passed

Jorjeous deleted the hf-asr-leaderboard branch January 15, 2026 12:41

coderabbitai bot mentioned this pull request Jan 22, 2026

Numb3rs ds addition #1174

Merged

coderabbitai bot mentioned this pull request Feb 3, 2026

Api call server #1189

Closed

wasiahmad pushed a commit that referenced this pull request Mar 3, 2026

HF ASR Leaderboard Fix (#1140)

e992227

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

HF ASR Leaderboard Fix (#1140)

0c2403a

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

HF ASR Leaderboard Fix (#1140)

6035395

Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Conversation

melllinia commented Dec 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

melllinia commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Jorjeous left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

melllinia commented Dec 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 23, 2025 •

edited

Loading

greptile-apps bot commented Jan 12, 2026 •

edited

Loading

Jorjeous left a comment •

edited

Loading