Adding clan PR with AudioBench and Librispeech PC. by Jorjeous · Pull Request #1103 · NVIDIA-NeMo/Skills

Jorjeous · 2025-12-12T15:59:50Z

Rebasing audiobench and audio tasks to unified asr metrics

Summary by CodeRabbit

Release Notes

New Features
- Added AudioBench audio evaluation dataset with judge and non-judge evaluation modes
- Added LibriSpeech-PC ASR dataset support
- Enhanced audio metrics with judge-based scoring using 0-5 rating scale
Documentation
- Updated guidance for running audio evaluations without audio files
- Added dataset preparation instructions for AudioBench and LibriSpeech-PC datasets
Tests
- Added test coverage for new audio datasets

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…nified asr metrics file Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

coderabbitai · 2025-12-12T16:06:46Z

📝 Walkthrough

Walkthrough

This PR introduces support for AudioBench and LibriSpeech-PC audio datasets, including dataset preparation scripts that download, process, and convert audio samples into nemo-skills manifest format. It adds corresponding configuration modules, updates evaluation metrics to handle judge-based rating extraction, and registers the datasets in the preparation pipeline.

Changes

Cohort / File(s)	Summary
Documentation `docs/evaluation/speech-audio.md`	Updated MMAU-Pro data preparation command; replaced inline note with warning block describing running evaluations without audio files; added AudioBench and LibriSpeech-PC sections detailing dataset locations, splits, and data preparation guidance.
AudioBench Dataset Configuration `nemo_skills/dataset/audiobench/__init__.py` `nemo_skills/dataset/audiobench/judge/__init__.py` `nemo_skills/dataset/audiobench/nonjudge/__init__.py`	Added module-level constants defining dataset group ("speechlm"), benchmark metadata, metrics type ("audio"), default splits, evaluation/generation arguments, and judge pipeline configuration for AudioBench judge and non-judge variants.
AudioBench Data Preparation `nemo_skills/dataset/audiobench/prepare.py`	New preparation script that loads AudioBench datasets from HuggingFace, extracts audio data, computes durations, optionally saves WAV files, and generates nemo-skills-compatible JSONL manifests with audio metadata and system/user messages for evaluation.
LibriSpeech-PC Dataset Configuration `nemo_skills/dataset/librispeech-pc/__init__.py`	Added module-level constants defining dataset group ("speechlm"), metrics type ("audio"), default split ("test-clean"), and evaluation/generation arguments.
LibriSpeech-PC Data Preparation `nemo_skills/dataset/librispeech-pc/prepare.py`	New preparation script that downloads LibriSpeech-PC manifests and audio splits, processes selected splits into nemo-skills JSONL format with transcription instructions, audio metadata, and expected answers for ASR evaluation.
Audio Evaluation Metrics `nemo_skills/evaluation/metrics/audio_metrics.py`	Extended judge result extraction to parse both "Rating: X" (0-5) and "Judgement: Yes/No" formats, returning `tuple[bool, float]` instead of `bool`; added judge\_ratings collection and judge\_score computation (mean rating \* 20) in metrics aggregation.
Judge Prompt Configurations `nemo_skills/prompt/config/judge/audiobench.yaml` `nemo_skills/prompt/config/judge/audiobench_binary.yaml`	Added two new judge prompt templates defining AudioBench evaluation with 0-5 rating scale and Yes/No binary format respectively, with explicit scoring criteria and output format specifications.
Audio Evaluator & Pipeline Integration `nemo_skills/evaluation/evaluator/audio.py` `nemo_skills/pipeline/prepare_data.py`	Extended missing-generation handling for AST/Translation/CER tasks with base response fields; added "librispeech-pc" and "audiobench" to DATASETS\_REQUIRE\_DATA\_DIR safety checks.
Tests `tests/test_datasets.py`	Added two new dataset entries ("audiobench" and "librispeech-pc") to DATASETS list for import-and-validate coverage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Areas requiring extra attention:
- nemo_skills/evaluation/metrics/audio_metrics.py — Judge result extraction logic now handles multiple formats and returns a tuple; verify judge\_rating propagation through metrics aggregation and that score computation (mean \* 20) correctly maps 0-5 to 0-100 scale.
- nemo_skills/dataset/audiobench/prepare.py and nemo_skills/dataset/librispeech-pc/prepare.py — Verify data loading, audio duration computation, manifest entry construction, and JSONL serialization; ensure audio file paths and metadata fields align with downstream evaluation expectations.
- nemo_skills/evaluation/evaluator/audio.py — Confirm that Translation task handling through the AST branch does not break existing AST evaluation logic and that missing-generation responses include correct fields for all task types.

Possibly related PRs

Evaluation on OJBench #848 — Also adds new datasets (ojbench) to DATASETS\_REQUIRE\_DATA\_DIR and introduces corresponding dataset preparation/configuration modules; related at the code-level through parallel dataset onboarding patterns.

Suggested labels

run GPU tests

Suggested reviewers

gwarmstrong

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title mentions AudioBench and LibriSpeech PC but uses vague phrase 'clan PR' without clear context, and doesn't convey the main objective of rebasing to unified ASR metrics.	Revise title to be more descriptive, such as 'Add AudioBench and LibriSpeech-PC datasets with unified audio metrics' or 'Rebase audio tasks to unified ASR evaluation metrics'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 84.21% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-audiobench-librispeech

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/evaluation/evaluator/audio.py (1)

328-334: Missing-generation payload is inconsistent for Translation (and likely AST/CER)

Line 328 expands the guard to Translation, but the early return still hard-codes "wer": 1.0. For translation tasks, this can (a) publish misleading metrics and/or (b) break downstream aggregation expecting "bleu" (and for CER expecting "cer").

Consider returning task-specific “worst” metrics based on task_type, e.g.:

-    if task_type in ["ASR", "ASR-PC", "AST", "Translation", "CER", "ASR_LEADERBOARD"] and not generation:
-        return {
-            "is_correct": False,
-            "wer": 1.0,
-            "error": "missing_generation",
-            "predicted_answer": "",
-        }
+    if task_type in ["ASR", "ASR-PC", "ASR_LEADERBOARD", "AST", "Translation", "CER"] and not generation:
+        base = {
+            "is_correct": False,
+            "error": "missing_generation",
+            "predicted_answer": "",
+        }
+        if task_type in ["AST", "Translation"]:
+            return {**base, "bleu": 0.0}
+        if task_type == "CER":
+            return {**base, "cer": 1.0}
+        # ASR / ASR-PC / ASR_LEADERBOARD
+        return {**base, "wer": 1.0}

🧹 Nitpick comments (5)

.gitignore (1)
48-51: Remove redundant .idea ignore entry (optional cleanup).

Line 48 (.idea) is redundant given Line 44-45 already ignore .idea/ and .idea/*. Consider dropping it to avoid confusion.
@@
-.idea
-
 # AudioBench repository (auto-cloned during data preparation)
 AudioBench/
nemo_skills/pipeline/prepare_data.py (1)

33-35: Good safety check expansion; consider making matching less brittle.

Adding librispeech-pc and audiobench to DATASETS_REQUIRE_DATA_DIR (Line 34) matches the intent. The if dataset in extra_arguments substring match can mis-detect; consider tokenizing/parsing args (or using a regex with word boundaries) and using a set for membership.

docs/evaluation/speech-audio.md (1)

275-338: Make CLI examples consistent about required --cluster when --data_dir is used.

Some examples include --data_dir without --cluster (e.g., Line 330-337). If ns prepare_data enforces cluster when data_dir is provided, mirror that in docs to avoid copy/paste failures.
nemo_skills/dataset/audiobench/judge/__init__.py (1)
32-39: Consider lowering the default judge serving footprint (8 GPUs) or making it “unset by default.”
Hardcoding 8 GPUs is a frequent footgun for anyone trying to run quick smoke tests locally/CI.
 JUDGE_PIPELINE_ARGS = {
     "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
     "server_type": "vllm",
-    "server_gpus": 8,
+    # Prefer a conservative default; allow run scripts to override for the official setup.
+    "server_gpus": 1,
     "server_args": "--max-model-len 8192 --gpu-memory-utilization 0.95",
 }
nemo_skills/dataset/audiobench/prepare.py (1)
194-227: git clone subprocess: add a timeout + prefer resolving git path (ruff S603/S607).
Inputs are currently constant, but this still spawns an external process with no timeout.
@@
-        subprocess.run(
-            ["git", "clone", audiobench_url, str(target_dir)],
+        git = shutil.which("git")
+        if not git:
+            print("✗ git command not found. Please install git or manually clone AudioBench.")
+            return False
+        subprocess.run(
+            [git, "clone", audiobench_url, str(target_dir)],
             check=True,
             capture_output=True,
             text=True,
+            timeout=1800,
         )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 281c487 and f0ae98e.

📒 Files selected for processing (13)

.gitignore (1 hunks)
docs/evaluation/speech-audio.md (3 hunks)
nemo_skills/dataset/audiobench/__init__.py (1 hunks)
nemo_skills/dataset/audiobench/judge/__init__.py (1 hunks)
nemo_skills/dataset/audiobench/nonjudge/__init__.py (1 hunks)
nemo_skills/dataset/audiobench/prepare.py (1 hunks)
nemo_skills/dataset/librispeech-pc/__init__.py (1 hunks)
nemo_skills/dataset/librispeech-pc/prepare.py (1 hunks)
nemo_skills/evaluation/evaluator/audio.py (2 hunks)
nemo_skills/pipeline/prepare_data.py (1 hunks)
nemo_skills/prompt/config/judge/audiobench.yaml (1 hunks)
nemo_skills/prompt/config/judge/audiobench_binary.yaml (1 hunks)
tests/test_datasets.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.14.8)

nemo_skills/dataset/audiobench/prepare.py

213-213: subprocess call: check for execution of untrusted input

(S603)

214-214: Starting a process with a partial executable path

(S607)

220-220: Consider moving this statement to an else block

(TRY300)

234-234: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

258-262: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

258-262: Avoid specifying long messages outside the exception class

(TRY003)

269-269: Do not catch blind exception: Exception

(BLE001)

270-270: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

270-270: Create your own exception

(TRY002)

270-270: Avoid specifying long messages outside the exception class

(TRY003)

336-336: Do not catch blind exception: Exception

(BLE001)

354-354: Do not catch blind exception: Exception

(BLE001)

511-511: Do not catch blind exception: Exception

(BLE001)

nemo_skills/dataset/librispeech-pc/prepare.py

39-39: Unused function argument: blocknum

(ARG001)

44-44: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (6)

nemo_skills/evaluation/evaluator/audio.py (1)

353-357: Routing Translation through the translation evaluation path looks correct

Line 353 cleanly aligns Translation with AST to share evaluate_translation(...) and update "predicted_answer" consistently.

nemo_skills/dataset/audiobench/nonjudge/__init__.py (1)

15-32: LGTM: clear dataset config constants for non-judge AudioBench tasks.

Constants are minimal and consistent (DATASET_GROUP="speechlm", METRICS_TYPE="audio", args strings).

tests/test_datasets.py (1)

58-62: LGTM: adds coverage for new datasets in the import/validation loop.

These additions should ensure the new dataset modules expose required defaults.

docs/evaluation/speech-audio.md (1)

48-53: Doc update aligns with new --skip_data_dir_check behavior.

Including --skip_data_dir_check for the --no-audio path is consistent with the new global guidance.

nemo_skills/dataset/librispeech-pc/__init__.py (1)

15-27: LGTM: minimal, consistent LibriSpeech-PC dataset defaults.

Defaults (DEFAULT_SPLIT="test-clean", METRICS_TYPE="audio", prompt/eval args) look aligned with the audio evaluation pipeline.

nemo_skills/dataset/audiobench/__init__.py (1)

15-36: Config-only initializer looks consistent and clear.
Docstring + DATASET_GROUP/SCORE_MODULE/BENCHMARKS layout is straightforward and should integrate cleanly with dataset discovery.

docs/evaluation/speech-audio.md

nemo_skills/dataset/audiobench/judge/__init__.py

nemo_skills/dataset/audiobench/prepare.py

coderabbitai · 2025-12-12T16:06:50Z

nemo_skills/dataset/audiobench/prepare.py

+def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> float:
+    """Compute audio duration in seconds from array and sampling rate."""
+    if audio_array is None or len(audio_array) == 0:
+        return 0.0
+    return float(len(audio_array) / sampling_rate)
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full file to understand context and verify the issues wc -l nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 107

🏁 Script executed:

# Check the actual code at the specified line ranges sed -n '111,116p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 337

🏁 Script executed:

# Check lines 229-370 and 475-513 sed -n '229,370p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 5062

🏁 Script executed:

# Check lines 475-513 sed -n '475,513p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 1644

🏁 Script executed:

# Check imports and Python version requirements head -30 nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 1267

🏁 Script executed:

# Search for JUDGE_DATASETS and NONJUDGE_DATASETS definitions grep -n "JUDGE_DATASETS\|NONJUDGE_DATASETS" nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 707

🏁 Script executed:

# Check if sys.path is cleaned up or if there's a sys.path manipulation issue grep -n -A5 "sys.path" nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 303

🏁 Script executed:

# Check if there's cleanup after sys.path.insert sed -n '254,270p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 716

🏁 Script executed:

# Check the template files - do they exist in the package? find nemo_skills/dataset/audiobench -name "__init__.py"

Repository: NVIDIA-NeMo/Skills

Length of output: 205

🏁 Script executed:

# Check the full context around category logging issue sed -n '485,505p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 933

🏁 Script executed:

# Check what JUDGE_DATASETS and NONJUDGE_DATASETS contain sed -n '41,120p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 2323

🏁 Script executed:

# Verify the init.py template files exist ls -la nemo_skills/dataset/audiobench/ | grep -E "judge|nonjudge|__init__"

Repository: NVIDIA-NeMo/Skills

Length of output: 241

🏁 Script executed:

# Check directory structure find nemo_skills/dataset/audiobench -type d

Repository: NVIDIA-NeMo/Skills

Length of output: 169

🏁 Script executed:

# Verify what's actually in the judge/__init__.py and nonjudge/__init__.py cat nemo_skills/dataset/audiobench/judge/__init__.py

Repository: NVIDIA-NeMo/Skills

Length of output: 1605

🏁 Script executed:

# Check how the code actually copies init files (lines 479-483) sed -n '475,485p' nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 486

🏁 Script executed:

# Verify the actual sys.path behavior - check if there's any cleanup elsewhere grep -n "sys.path.pop\|sys.path.remove\|sys.path =" nemo_skills/dataset/audiobench/prepare.py

Repository: NVIDIA-NeMo/Skills

Length of output: 44

Guard sampling_rate against zero/falsy values; fix sys.path pollution; correct template __init__.py copy logic; use normalized dataset_name for category logging.

These are correctness footguns causing runtime failures (division by zero), module import state corruption across datasets, broken template initialization, and misleading logs.

sampling_rate guard (line 115): Add zero check before division:

def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> float: """Compute audio duration in seconds from array and sampling rate.""" if audio_array is None or len(audio_array) == 0: return 0.0 + if not sampling_rate: + return 0.0 return float(len(audio_array) / sampling_rate)

sys.path cleanup (line 254): Add finally block to remove inserted path:

sys.path.insert(0, str(audiobench_path / "src")) try: from dataset import Dataset except ImportError as e: raise ImportError( f"Failed to import AudioBench Dataset class: {e}\n" f"AudioBench path: {audiobench_path}\n" f"Make sure AudioBench repository is properly set up." ) + finally: + if sys.path and sys.path[0] == str(audiobench_path / "src"): + sys.path.pop(0)

Template init copy logic (lines 479–483): Copy from package template, not from output directory:

for category in ["judge", "nonjudge"]: category_dir = output_dir / category category_dir.mkdir(exist_ok=True) # Copy category __init__.py if missing init_file = category_dir / "__init__.py" - template_init = output_dir / category / "__init__.py" + template_init = Path(__file__).parent / category / "__init__.py" if not init_file.exists() and template_init.exists(): shutil.copy2(template_init, init_file)

Category logging (line 497): Use normalized dataset_name instead of loop variable name:

- category = "judge" if name in JUDGE_DATASETS else "nonjudge" + category = "judge" if dataset_name in JUDGE_DATASETS else "nonjudge" + if dataset_name not in JUDGE_DATASETS and dataset_name not in NONJUDGE_DATASETS: + category = "unknown"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> float:

"""Compute audio duration in seconds from array and sampling rate."""

if audio_array is None or len(audio_array) == 0:

return 0.0

return float(len(audio_array) / sampling_rate)

def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> float:

"""Compute audio duration in seconds from array and sampling rate."""

if audio_array is None or len(audio_array) == 0:

return 0.0

if not sampling_rate:

return 0.0

return float(len(audio_array) / sampling_rate)

🤖 Prompt for AI Agents

In nemo_skills/dataset/audiobench/prepare.py around lines 111-116, 254, 479-483 and 497: guard against a zero or falsy sampling_rate by returning 0.0 or raising a clear error before performing the division in get_audio_duration to avoid division-by-zero; when temporarily inserting a path into sys.path, ensure you remove that path in a finally block (or use context management) so sys.path is not polluted after imports; when copying the template __init__.py, copy from the package template source location (the template package path) rather than from the output directory so the correct template is used; and for category logging replace the loop variable use with the normalized dataset_name so logs reference the normalized name rather than the iterator variable.

nemo_skills/dataset/audiobench/prepare.py

nemo_skills/dataset/librispeech-pc/prepare.py

nemo_skills/prompt/config/judge/audiobench_binary.yaml

nemo_skills/prompt/config/judge/audiobench.yaml

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous · 2025-12-12T16:52:23Z

@coderabbitai review

coderabbitai · 2025-12-12T16:52:29Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (13)

nemo_skills/prompt/config/judge/audiobench_binary.yaml (1)

5-29: Keep the “rating” wording as-is (per prior clarification).
No further changes suggested here.

nemo_skills/dataset/librispeech-pc/prepare.py (5)

35-45: Harden downloader: enforce https + fix tqdm accounting.

 def download_with_progress(url: str, output_path: Path, desc: str):
     """Download file with tqdm progress bar."""
+    if not url.startswith("https://"):
+        raise ValueError(f"Refusing to download non-https URL: {url}")
     with tqdm(unit="B", unit_scale=True, unit_divisor=1024, desc=desc) as pbar:
 
         def reporthook(blocknum, blocksize, totalsize):
             if pbar.total != totalsize:
                 pbar.total = totalsize
-            pbar.update(blocksize)
+            downloaded = blocknum * blocksize
+            pbar.update(max(0, downloaded - pbar.n))
 
         urllib.request.urlretrieve(url, output_path, reporthook)

57-73: Fix manifest presence check + remove tarfile(..., filter="data") incompat + add safe extraction.
filter="data" will break on some supported Python versions, and extract* needs traversal hardening. Also, check both manifests before early return.

 def download_manifests(output_dir: Path) -> Path:
     """Download LibriSpeech-PC manifests if not already present."""
-    if (output_dir / "test-clean.json").exists():
+    output_dir.mkdir(parents=True, exist_ok=True)
+    if (output_dir / "test-clean.json").exists() and (output_dir / "test-other.json").exists():
         return output_dir
+
+    def safe_extract_member(tar: tarfile.TarFile, member: tarfile.TarInfo, dest: Path) -> None:
+        dest = dest.resolve()
+        target = (dest / member.name).resolve()
+        if not str(target).startswith(str(dest) + os.sep):
+            raise ValueError(f"Refusing to extract path outside destination: {member.name}")
+        tar.extract(member, dest)
@@
     with tarfile.open(tar_path, "r:gz") as tar:
         for member in tar.getmembers():
             if member.name in ["test-clean.json", "test-other.json"]:
-                tar.extract(member, output_dir, filter="data")
+                safe_extract_member(tar, member, output_dir)

75-86: Fix split directory check + avoid filter="data" in extractall.
split.replace("-", "_") likely won’t match the extracted LibriSpeech/test-clean directory name, causing repeated downloads.

 def download_audio(split: str, audio_dir: Path):
     """Download LibriSpeech audio files if not already present."""
-    split_dir = audio_dir / "LibriSpeech" / split.replace("-", "_")
+    split_dir = audio_dir / "LibriSpeech" / split
     if split_dir.exists():
         return
@@
     with tarfile.open(tar_path, "r:gz") as tar:
-        tar.extractall(audio_dir, filter="data")
+        # Avoid `filter=...` for broader Python compatibility; consider adding
+        # a safe-extract wrapper (like in download_manifests) if you want traversal protection here too.
+        tar.extractall(audio_dir)

89-145: Fix container path composition (hardcoded root + likely double “LibriSpeech/” prefix).
Make container root configurable and normalize audio_filepath from the manifest to a relative path under LibriSpeech/.

             audio_id = Path(audio_filepath).stem
 
-            container_path = f"/dataset/librispeech-pc/LibriSpeech/{audio_filepath}"
+            container_root = os.getenv("NEMO_SKILLS_DATASET_ROOT", "/dataset")
+            parts = Path(audio_filepath).parts
+            if "LibriSpeech" in parts:
+                rel = Path(*parts[parts.index("LibriSpeech") + 1 :])
+            else:
+                rel = Path(audio_filepath.lstrip("/"))
+            container_path = f"{container_root}/librispeech-pc/LibriSpeech/{rel.as_posix()}"

147-172: Add --data_dir (docs mention it) and don’t write into the installed package dir by default.

 def main():
     parser = argparse.ArgumentParser(description="Prepare LibriSpeech-PC for ASR evaluation")
+    parser.add_argument(
+        "--data_dir",
+        type=str,
+        default=os.getenv("NEMO_SKILLS_DATA_DIR"),
+        help="Base data dir. Output goes under <data_dir>/librispeech-pc.",
+    )
@@
     args = parser.parse_args()
 
-    data_dir = Path(__file__).parent
-    audio_dir = data_dir
-    audio_dir.mkdir(exist_ok=True)
+    if not args.data_dir:
+        raise SystemExit("Missing --data_dir and NEMO_SKILLS_DATA_DIR is not set.")
+    data_dir = (Path(args.data_dir).expanduser().resolve() / "librispeech-pc")
+    audio_dir = data_dir
+    audio_dir.mkdir(parents=True, exist_ok=True)

nemo_skills/dataset/audiobench/prepare.py (7)

111-116: Guard against zero/falsy sampling_rate (division-by-zero).

 def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> float:
     """Compute audio duration in seconds from array and sampling rate."""
     if audio_array is None or len(audio_array) == 0:
         return 0.0
+    if not sampling_rate:
+        return 0.0
     return float(len(audio_array) / sampling_rate)

124-176: Make manifest audio root configurable (don’t hardcode /data).
Right now the script writes under output_dir/... but the manifest always points at /data/audiobench/....

 def create_manifest_entry(
@@
     sample_id: int,
     category: str,
+    audio_root: str,
 ) -> Dict:
@@
-    audio_rel_path = f"/data/audiobench/{category}/audio/{dataset_name}/{audio_filename}"
+    audio_rel_path = f"{audio_root.rstrip('/')}/audiobench/{category}/audio/{dataset_name}/{audio_filename}"
@@
     entry = {
@@
     }

253-263: Avoid leaking sys.path mutations (add finally cleanup).

-    sys.path.insert(0, str(audiobench_path / "src"))
+    inserted = str(audiobench_path / "src")
+    sys.path.insert(0, inserted)
     try:
         from dataset import Dataset
     except ImportError as e:
         raise ImportError(
@@
             f"Make sure AudioBench repository is properly set up."
         )
+    finally:
+        if sys.path and sys.path[0] == inserted:
+            sys.path.pop(0)

372-426: Fix default output_dir (don’t write into site-packages); align with CLI help.

     parser.add_argument(
         "--output_dir",
         type=str,
         default=None,
         help="Output directory (defaults to $NEMO_SKILLS_DATA_DIR/audiobench)",
     )
+    parser.add_argument(
+        "--audio-root",
+        type=str,
+        default=os.getenv("NEMO_SKILLS_AUDIO_ROOT", "/data"),
+        help="Container-visible root prefix for audio paths (default: /data).",
+    )
@@
-    if args.output_dir:
-        output_dir = Path(args.output_dir)
-    else:
-        # Use dataset directory as output (files will be in nemo_skills/dataset/audiobench/)
-        output_dir = Path(__file__).parent
+    if args.output_dir:
+        output_dir = Path(args.output_dir)
+    else:
+        base = os.getenv("NEMO_SKILLS_DATA_DIR")
+        if not base:
+            raise SystemExit(
+                "Missing --output_dir and NEMO_SKILLS_DATA_DIR is not set. "
+                "Refusing to write into the installed package directory."
+            )
+        output_dir = Path(base) / "audiobench"

474-483: Fix init.py template copy (current code copies from the destination to itself).

     for category in ["judge", "nonjudge"]:
         category_dir = output_dir / category
         category_dir.mkdir(exist_ok=True)
@@
         init_file = category_dir / "__init__.py"
-        template_init = output_dir / category / "__init__.py"
+        template_init = Path(__file__).parent / category / "__init__.py"
         if not init_file.exists() and template_init.exists():
             shutil.copy2(template_init, init_file)

488-498: Use normalized dataset_name for category logging (and handle unknown).

-        category = "judge" if name in JUDGE_DATASETS else "nonjudge"
+        if dataset_name in JUDGE_DATASETS:
+            category = "judge"
+        elif dataset_name in NONJUDGE_DATASETS:
+            category = "nonjudge"
+        else:
+            category = "unknown"

341-350: Pass the new audio_root through to manifest entries.
Follow-up to making /data configurable.

             entry = create_manifest_entry(
                 sample=sample,
                 audio_filename=audio_filename,
                 duration=duration,
                 dataset_name=dataset_name,
                 sample_id=idx,
                 category=category,
+                audio_root=os.getenv("NEMO_SKILLS_AUDIO_ROOT", "/data"),
             )

🧹 Nitpick comments (1)

nemo_skills/evaluation/evaluator/audio.py (1)

328-339: Nice missing-generation normalization; consider a constant/set for task types.
Current logic looks correct; using a set (or module-level constant) would reduce churn risk as more task types are added.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0ae98e and 2894e7e.

📒 Files selected for processing (10)

nemo_skills/dataset/audiobench/__init__.py (1 hunks)
nemo_skills/dataset/audiobench/judge/__init__.py (1 hunks)
nemo_skills/dataset/audiobench/nonjudge/__init__.py (1 hunks)
nemo_skills/dataset/audiobench/prepare.py (1 hunks)
nemo_skills/dataset/librispeech-pc/__init__.py (1 hunks)
nemo_skills/dataset/librispeech-pc/prepare.py (1 hunks)
nemo_skills/evaluation/evaluator/audio.py (2 hunks)
nemo_skills/evaluation/metrics/audio_metrics.py (5 hunks)
nemo_skills/prompt/config/judge/audiobench.yaml (1 hunks)
nemo_skills/prompt/config/judge/audiobench_binary.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

nemo_skills/dataset/audiobench/init.py
nemo_skills/prompt/config/judge/audiobench.yaml
nemo_skills/dataset/audiobench/judge/init.py
nemo_skills/dataset/librispeech-pc/init.py
nemo_skills/dataset/audiobench/nonjudge/init.py

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Jorjeous
Repo: NVIDIA-NeMo/Skills PR: 1103
File: nemo_skills/prompt/config/judge/audiobench.yaml:15-28
Timestamp: 2025-12-12T16:09:48.719Z
Learning: In AudioBench judge prompt configuration (nemo_skills/prompt/config/judge/audiobench.yaml), having duplicate Score0 entries is intentional - one for "refusing to give concrete results" and another for "completely misaligned" answers. These should remain as separate entries rather than being combined.

📚 Learning: 2025-12-12T16:09:48.719Z

Learnt from: Jorjeous
Repo: NVIDIA-NeMo/Skills PR: 1103
File: nemo_skills/prompt/config/judge/audiobench.yaml:15-28
Timestamp: 2025-12-12T16:09:48.719Z
Learning: In AudioBench judge prompt configuration (nemo_skills/prompt/config/judge/audiobench.yaml), having duplicate Score0 entries is intentional - one for "refusing to give concrete results" and another for "completely misaligned" answers. These should remain as separate entries rather than being combined.

Applied to files:

nemo_skills/dataset/audiobench/prepare.py
nemo_skills/evaluation/metrics/audio_metrics.py
nemo_skills/prompt/config/judge/audiobench_binary.yaml

🪛 Ruff (0.14.8)

nemo_skills/dataset/librispeech-pc/prepare.py

39-39: Unused function argument: blocknum

(ARG001)

44-44: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

nemo_skills/dataset/audiobench/prepare.py

213-213: subprocess call: check for execution of untrusted input

(S603)

214-214: Starting a process with a partial executable path

(S607)

220-220: Consider moving this statement to an else block

(TRY300)

234-234: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

258-262: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

258-262: Avoid specifying long messages outside the exception class

(TRY003)

269-269: Do not catch blind exception: Exception

(BLE001)

270-270: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

270-270: Create your own exception

(TRY002)

270-270: Avoid specifying long messages outside the exception class

(TRY003)

336-336: Do not catch blind exception: Exception

(BLE001)

354-354: Do not catch blind exception: Exception

(BLE001)

511-511: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (4)

nemo_skills/evaluation/evaluator/audio.py (1)

358-362: Treating Translation like AST is consistent with BLEU-based evaluation.
LGTM.

nemo_skills/evaluation/metrics/audio_metrics.py (3)

215-219: Good: only collect judge_ratings when judgement exists + category is open.
LGTM.

245-250: judge_score computation is fine once ratings are consistently 0–5.
After the binary mapping fix above, * 20 is consistent.

312-315: Conditional printing of judge_score is clean.
LGTM.

nemo_skills/evaluation/metrics/audio_metrics.py

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous · 2025-12-16T13:53:28Z

@coderabbitai review

coderabbitai · 2025-12-16T13:53:35Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-12-16T13:54:01Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (6)

nemo_skills/dataset/librispeech-pc/prepare.py (1)
36-46: Consider validating URL scheme before download.

urllib.request.urlretrieve accepts arbitrary schemes including file://. Since the URLs are constants in this file, the risk is low, but adding a scheme check provides defense-in-depth.
 def download_with_progress(url: str, output_path: Path, desc: str):
     """Download file with tqdm progress bar."""
+    if not url.startswith(("https://", "http://")):
+        raise ValueError(f"Refusing to download non-http(s) URL: {url}")
     with tqdm(unit="B", unit_scale=True, unit_divisor=1024, desc=desc) as pbar:
docs/evaluation/speech-audio.md (1)
5-8: Fix grammar in warning block.

Line 6: "If you want to evaluation…" → "If you want to evaluate…".
-    If you want to evaluation without audio files (not recommended) use
+    If you want to evaluate without audio files (not recommended) use
nemo_skills/dataset/audiobench/prepare.py (4)
109-113: Guard against zero sampling_rate to prevent division by zero.

If sampling_rate is 0 or falsy (e.g., missing from audio metadata), this will raise a ZeroDivisionError.
 def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> float:
     """Compute audio duration in seconds from array and sampling rate."""
     if audio_array is None or len(audio_array) == 0:
         return 0.0
+    if not sampling_rate:
+        return 0.0
     return float(len(audio_array) / sampling_rate)
526-531: Default output_dir writes into the package directory (often read-only).

When --output_dir is not provided, the script defaults to Path(__file__).parent, which is the installed package directory (e.g., site-packages/.../nemo_skills/dataset/audiobench/). This location is typically read-only and inappropriate for runtime data.
     # Determine output directory
     if args.output_dir:
         output_dir = Path(args.output_dir)
     else:
-        # Use dataset directory as output (files will be in nemo_skills/dataset/audiobench/)
-        output_dir = Path(__file__).parent
+        base = os.getenv("NEMO_SKILLS_DATA_DIR")
+        if not base:
+            raise SystemExit(
+                "Missing --output_dir and NEMO_SKILLS_DATA_DIR is not set. "
+                "Refusing to write into the installed package directory."
+            )
+        output_dir = Path(base) / "audiobench"
556-565: Template __init__.py copy logic references output directory instead of package source.

Lines 563-564 set template_init = output_dir / category / "__init__.py", which is the same location as init_file. The template should be copied from the package's source directory.
         # Copy category __init__.py
         init_file = category_dir / "__init__.py"
-        template_init = output_dir / category / "__init__.py"
+        template_init = Path(__file__).parent / category / "__init__.py"
         if not init_file.exists() and template_init.exists():
             shutil.copy2(template_init, init_file)
578-579: Use normalized dataset_name for category determination, not loop variable name.

After normalization (lines 572-576) may add _test suffix, the category check at line 579 still uses the original name. This could cause incorrect category assignment.
         # Determine category for logging
-        category = "judge" if name in JUDGE_DATASETS else "nonjudge"
+        category = "judge" if dataset_name in JUDGE_DATASETS else "nonjudge"

🧹 Nitpick comments (3)

nemo_skills/dataset/audiobench/prepare.py (3)
161-163: Hardcoded /data/ prefix may not match local or alternative deployment roots.

The audio path prefix is hardcoded to /data/audiobench/.... Consider making this configurable via an environment variable or CLI argument for flexibility across different deployment scenarios.

Please verify if this hardcoded path is intentional for the target deployment environment, or if it should be configurable similar to how librispeech-pc/prepare.py uses NEMO_SKILLS_AUDIO_ROOT.

229-234: Preserve exception chain when re-raising.

Use raise ... from e to maintain the exception chain for better debugging.
     try:
         from datasets import load_dataset
     except Exception as e:
         raise ImportError(
             f"Failed to import HuggingFace 'datasets'. Please ensure it is installed.\nOriginal error: {e}"
-        )
+        ) from e
379-388: Preserve exception chain when re-raising.

Use raise ... from e to maintain the exception chain for debugging.
     except Exception as e:
         raise Exception(
             "Failed to load AudioBench dataset via HuggingFace.\n"
             f"- Requested dataset_name: {dataset_name}\n"
             ...
-        )
+        ) from e

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2894e7e and 9ad76ae.

📒 Files selected for processing (4)

docs/evaluation/speech-audio.md (4 hunks)
nemo_skills/dataset/audiobench/prepare.py (1 hunks)
nemo_skills/dataset/librispeech-pc/prepare.py (1 hunks)
nemo_skills/evaluation/metrics/audio_metrics.py (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-12T16:09:53.870Z

Learnt from: Jorjeous
Repo: NVIDIA-NeMo/Skills PR: 1103
File: nemo_skills/prompt/config/judge/audiobench.yaml:15-28
Timestamp: 2025-12-12T16:09:53.870Z
Learning: In AudioBench judge prompt configuration (nemo_skills/prompt/config/judge/audiobench.yaml), having duplicate Score0 entries is intentional - one for "refusing to give concrete results" and another for "completely misaligned" answers. These should remain as separate entries rather than being combined.

Applied to files:

nemo_skills/evaluation/metrics/audio_metrics.py
nemo_skills/dataset/audiobench/prepare.py

🪛 Ruff (0.14.8)

nemo_skills/dataset/librispeech-pc/prepare.py

46-46: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

93-93: Uses of tarfile.extractall()

(S202)

181-181: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/dataset/audiobench/prepare.py

231-231: Do not catch blind exception: Exception

(BLE001)

232-234: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

232-234: Avoid specifying long messages outside the exception class

(TRY003)

371-371: Do not catch blind exception: Exception

(BLE001)

379-379: Do not catch blind exception: Exception

(BLE001)

380-388: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

380-388: Create your own exception

(TRY002)

380-388: Avoid specifying long messages outside the exception class

(TRY003)

449-449: Do not catch blind exception: Exception

(BLE001)

467-467: Do not catch blind exception: Exception

(BLE001)

592-592: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (8)

docs/evaluation/speech-audio.md (1)

347-409: LGTM!

The new AudioBench and LibriSpeech-PC documentation sections are well-structured and follow the existing documentation patterns. They clearly document dataset locations, preparation commands, and available splits.

nemo_skills/evaluation/metrics/audio_metrics.py (3)

77-113: LGTM! Judge result extraction properly handles both formats.

The implementation correctly:

Parses decimal ratings with [0-9]+(?:\.[0-9]+)?

Clamps ratings to [0, 5] range

Maps legacy Yes/No to 5.0/0.0 for consistent 0-100 scaling when multiplied by 20

246-250: LGTM!

The judge_score computation correctly converts the 0-5 rating scale to 0-100 by multiplying the average by 20, and the empty-list guard prevents division by zero.

313-315: LGTM!

The judge_score is correctly formatted as a plain number (0-100 scale) rather than as a percentage, which aligns with the scaling applied in get_metrics().

nemo_skills/dataset/librispeech-pc/prepare.py (4)

67-73: LGTM! Tarfile filter compatibility handled correctly.

The version guard sys.version_info >= (3, 11, 4) properly enables the filter="data" security feature on supported Python versions while maintaining compatibility with Python 3.10.

126-130: LGTM! Audio path construction handles prefix correctly.

The logic properly strips leading slashes and the LibriSpeech/ prefix to avoid double-prefixing, and uses NEMO_SKILLS_AUDIO_ROOT environment variable for configurability.

180-184: LGTM! Data directory handling is robust.

The script correctly requires --data_dir or NEMO_SKILLS_DATA_DIR environment variable, preventing accidental writes to the package installation directory.

89-93: Acceptable risk: extractall on trusted source.

The tarfile.extractall() call is flagged by static analysis (S202), but the risk is mitigated by:

The archive source is the official OpenSLR repository (trusted)

The filter="data" security feature is enabled on Python 3.11.4+

For Python 3.10/3.11.0-3.11.3, manual member filtering could further harden this, but given the trusted source this is acceptable.

gwarmstrong

Quick Q about merge confict.

Also, I suspect we will need to add these datasets to the exclude list for gpu-tests: https://github.com/NVIDIA-NeMo/Skills/blob/main/tests/gpu-tests/test_eval.py#L29-L47

Re-running slurm tests to confirm.

nemo_skills/pipeline/prepare_data.py

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous · 2025-12-16T18:05:59Z

Yep, probably we should add this to exlude list

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: dlord <dlord@nvidia.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Adding clan PR with AudioBench and Librispeech PC, already based on u…

f0ae98e

…nified asr metrics file Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous requested a review from melllinia December 12, 2025 15:59

Jorjeous assigned Jorjeous and melllinia Dec 12, 2025

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

Jorjeous added 2 commits December 12, 2025 08:24

upd to multiscore and rabbit considerations

eb9112d

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

pre commit fix

2894e7e

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

nemo_skills/evaluation/metrics/audio_metrics.py Show resolved Hide resolved

Jorjeous mentioned this pull request Dec 12, 2025

Audiobench and LibriSpeech-PC Benchmarks Evaluation #1060

Closed

Jorjeous changed the title ~~Adding clan PR with AudioBench and Librispeech PC,~~ Adding clan PR with AudioBench and Librispeech PC. Dec 16, 2025

Jorjeous and others added 5 commits December 16, 2025 05:11

Revritten prepare.py to avoid using cloned repo

af4d5b9

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

address rabbit comments

2885824

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

update prepare.py for librispeech to address rabbit comments

d5edeb8

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Merge branch 'main' into add-audiobench-librispeech

f0f473b

Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

pre commit run

9ad76ae

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous requested a review from gwarmstrong December 16, 2025 13:54

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

gwarmstrong reviewed Dec 16, 2025

View reviewed changes

nemo_skills/pipeline/prepare_data.py Outdated Show resolved Hide resolved

gwarmstrong added the run GPU tests label Dec 16, 2025

Jorjeous and others added 2 commits December 16, 2025 09:47

add asr leaderboard

e6e9f4e

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Merge branch 'main' into add-audiobench-librispeech

63805be

Jorjeous added run GPU tests and removed run GPU tests labels Dec 16, 2025

upd prep for ls

88eefb2

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous added run GPU tests and removed run GPU tests labels Dec 16, 2025

data dir for librispeech

66ae86b

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous added run GPU tests and removed run GPU tests labels Dec 16, 2025

upd librispeechpc init

3609ba5

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

Jorjeous added run GPU tests and removed run GPU tests labels Dec 16, 2025

Jorjeous added 2 commits December 16, 2025 12:58

exclude librispeech

c5e4a3c

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

pre commit

48c8be0

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

gwarmstrong added run GPU tests and removed run GPU tests labels Dec 16, 2025

gwarmstrong approved these changes Dec 16, 2025

View reviewed changes

gwarmstrong enabled auto-merge (squash) December 16, 2025 21:11

gwarmstrong merged commit 4563f60 into main Dec 16, 2025
6 checks passed

gwarmstrong deleted the add-audiobench-librispeech branch December 16, 2025 21:20

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Adding clan PR with AudioBench and Librispeech PC. (#1103)

d2da066

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

coderabbitai bot mentioned this pull request Dec 23, 2025

HF ASR Leaderboard Fix #1140

Merged

coderabbitai bot mentioned this pull request Jan 9, 2026

Audio input output integration #1157

Merged

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

Adding clan PR with AudioBench and Librispeech PC. (#1103)

88ad93b

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

coderabbitai bot mentioned this pull request Feb 12, 2026

Eval kit support #1239

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Adding clan PR with AudioBench and Librispeech PC. (#1103)

97c4b01

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>

Conversation

Jorjeous commented Dec 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jorjeous commented Dec 12, 2025

Uh oh!

coderabbitai bot commented Dec 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jorjeous commented Dec 16, 2025

Uh oh!

coderabbitai bot commented Dec 16, 2025

Uh oh!

coderabbitai bot commented Dec 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gwarmstrong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jorjeous commented Dec 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jorjeous commented Dec 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 12, 2025 •

edited

Loading