Skip to content

Conversation

@blisc
Copy link
Collaborator

@blisc blisc commented Oct 30, 2025

What does this PR do ?

Updates MagpieTTS with latest dev changes.

Collection: tts

Changelog

  • Updates MagpieTTS codebase

XuesongYang and others added 30 commits March 17, 2025 11:23
moved t5tts script to magpietts

Signed-off-by: Xuesong Yang <[email protected]>
* wip

Signed-off-by: Paarth Neekhara <[email protected]>

* attn prior inference implementation

Signed-off-by: Paarth Neekhara <[email protected]>

* more hacks

Signed-off-by: Paarth Neekhara <[email protected]>

* minor tweaks

Signed-off-by: Paarth Neekhara <[email protected]>

* clean ups and make text attention strictly monotonic  at inference

Signed-off-by: Paarth Neekhara <[email protected]>

* more updates

Signed-off-by: Paarth Neekhara <[email protected]>

* minor tweaks

Signed-off-by: Paarth Neekhara <[email protected]>

* compute head wise attention maps

Signed-off-by: Paarth Neekhara <[email protected]>

* configurable ctc prior layers during training

Signed-off-by: Paarth Neekhara <[email protected]>

* log only ctc prior layers on tensorboard

Signed-off-by: Paarth Neekhara <[email protected]>

* add layerwise logging

Signed-off-by: Paarth Neekhara <[email protected]>

* more configurable inference

Signed-off-by: Paarth Neekhara <[email protected]>

* more conifigs

Signed-off-by: Paarth Neekhara <[email protected]>

* updated end prediction logic as per discussion with roy

Signed-off-by: Paarth Neekhara <[email protected]>

* DPO preference pair creations: add option to choose min length

* Cleanup

* handle cases where predicted codes are very small, havent tested but should work

Signed-off-by: Paarth Neekhara <[email protected]>

* undo predicted len change since it is not needed

Signed-off-by: Paarth Neekhara <[email protected]>

* clean up notebook

Signed-off-by: Paarth Neekhara <[email protected]>

---------

Signed-off-by: Paarth Neekhara <[email protected]>
Co-authored-by: Fejgin, Roy <[email protected]>
When doing Pareto ranking make sure to only compare indices that correspond
to metrics.
…#52)

Updated by Jason

* local transformer training tested, prediction not tested

Signed-off-by: Paarth Neekhara <[email protected]>

* local transformer updates

Signed-off-by: Paarth Neekhara <[email protected]>

* local transformer inference working

Signed-off-by: Paarth Neekhara <[email protected]>

* aligner module

Signed-off-by: Paarth Neekhara <[email protected]>

* aligner module updates

Signed-off-by: Paarth Neekhara <[email protected]>

* wip

Signed-off-by: Paarth Neekhara <[email protected]>

* wip

Signed-off-by: Paarth Neekhara <[email protected]>

* change aligner text input to encoder output

Signed-off-by: Paarth Neekhara <[email protected]>

* obtain hard alignment from t5tts decoder

Signed-off-by: Paarth Neekhara <[email protected]>

* log hard attention training

Signed-off-by: Paarth Neekhara <[email protected]>

* binarization method, obtain_prior_from_cross_attn fix

Signed-off-by: Paarth Neekhara <[email protected]>

* added configs for local transformer and alignment encoder

Signed-off-by: Paarth Neekhara <[email protected]>

* added prior window decay factors

Signed-off-by: Paarth Neekhara <[email protected]>

* more configs..

Signed-off-by: Paarth Neekhara <[email protected]>

* config was missing

Signed-off-by: Paarth Neekhara <[email protected]>

* slight modification in alignment encoder computation, pass target audio embeddings (removing bos)

Signed-off-by: Paarth Neekhara <[email protected]>

* some comments

Signed-off-by: Paarth Neekhara <[email protected]>

* prior prob configurable

Signed-off-by: Paarth Neekhara <[email protected]>

* update yamls

Signed-off-by: Paarth Neekhara <[email protected]>

* refactor inference prior code

Signed-off-by: Paarth Neekhara <[email protected]>

* set prior epsilon to 0 to avoid any attention scores on unintended parts

Signed-off-by: Paarth Neekhara <[email protected]>

* make prior epsilon configurable in training

Signed-off-by: Paarth Neekhara <[email protected]>

* added rtf metrics and notebook, infer and evaluate changes

Signed-off-by: Paarth Neekhara <[email protected]>

* turn off alignment encoder training after 50k steps

Signed-off-by: Paarth Neekhara <[email protected]>

---------

Signed-off-by: Paarth Neekhara <[email protected]>
Updated by Jason, added back inference class

* wavlm speaker eval

Signed-off-by: Shehzeen Hussain <[email protected]>

* connect to inference script

Signed-off-by: Shehzeen Hussain <[email protected]>

* bug fix

Signed-off-by: Shehzeen Hussain <[email protected]>

* grpo started, training seems to be working

Signed-off-by: Shehzeen Hussain <[email protected]>

* grpo local training seems ok

Signed-off-by: Shehzeen Hussain <[email protected]>

* only one generation per item in val

Signed-off-by: Shehzeen Hussain <[email protected]>

* allow cfg use during generation process

Signed-off-by: Shehzeen Hussain <[email protected]>

* fix cer threshold for 0 reward

Signed-off-by: Shehzeen Hussain <[email protected]>

* use kv cache for grpo generation

Signed-off-by: Shehzeen Hussain <[email protected]>

* remove kv cache for now

Signed-off-by: Shehzeen Hussain <[email protected]>

* kv cache for online po configurable

Signed-off-by: Shehzeen Hussain <[email protected]>

* configurable reward params

Signed-off-by: Shehzeen Hussain <[email protected]>

* grpo val set added in evalset

Signed-off-by: Shehzeen Hussain <[email protected]>

* comments update

Signed-off-by: Shehzeen Hussain <[email protected]>

* modify reward scaling

Signed-off-by: Shehzeen Hussain <[email protected]>

* moved preference optimization code and classes to a new file

Signed-off-by: Shehzeen Hussain <[email protected]>

* missing file

Signed-off-by: Shehzeen Hussain <[email protected]>

* added language option in online PO

Signed-off-by: Shehzeen Hussain <[email protected]>

* some updates in the script

Signed-off-by: Shehzeen Hussain <[email protected]>

* add reference free option

Signed-off-by: Shehzeen Hussain <[email protected]>

* handle corner cases

Signed-off-by: Shehzeen Hussain <[email protected]>

* bug fix in reference free mode and torch.load fix for new container

Signed-off-by: Shehzeen Hussain <[email protected]>

* added option for pesq reward

Signed-off-by: Shehzeen Hussain <[email protected]>

* pesq device bug fix

Signed-off-by: Shehzeen Hussain <[email protected]>

---------

Signed-off-by: Shehzeen Hussain <[email protected]>
Signed-off-by: Shehzeen Hussain <[email protected]>
* add back missing dev files

Signed-off-by: Jason <[email protected]>

* more bug fixes from merge

Signed-off-by: Jason <[email protected]>

* add latest changes for rc5 docker

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
…to make the recipe working with PTL 1.9+. (#47)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
…. modify and make it optional. (#49)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
…er. (#52)

* structured both loggers for train/val/test.
* enable `resume` param to ensure the resumed training logs being merged on the previous run id.
* removed `tb_logger` func.

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
* [magpietts] minor fix for the usage of freezing a model.

Signed-off-by: Xuesong Yang <[email protected]>

* fixed a typo.

Signed-off-by: Xuesong Yang <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Jason <[email protected]>
)

* trainer import fix for new pytorch lightning

Signed-off-by: Paarth Neekhara <[email protected]>

* handle strict prior window correctly

Signed-off-by: Paarth Neekhara <[email protected]>

* disable autocasting codec model and making prior window strict

Signed-off-by: Paarth Neekhara <[email protected]>

---------

Signed-off-by: Paarth Neekhara <[email protected]>
…oader. (#54)

* [magpie][lhotse] added a lhotse dataloader for monologue tts. this is a working recipe with num_workers>0 for training and num_workers=0 for val datasets. Still faced issues when num_workers>0 during validation steps. Investigating rootcauses.
* all contents in a batch are obtained correctly, but dtype mismatches.
* fix dtype for text tokens and codec codes.

Signed-off-by: Xuesong Yang <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]>

* [lhotse_shar_prep] add script to create shar dataset.
* with more efficient changes.
* bugfix previously the last batch would be dropped if the size is less than the buffer size. this fixes it.

Signed-off-by: Xuesong Yang <[email protected]>

* [lhotse_dataloader] clean up commented lines.

Signed-off-by: Xuesong Yang <[email protected]>

* [lhotse_dataloader] bugfix to force spawn over fork to address CUDA
initialization errors when multiple workers are used during validation.

Signed-off-by: Xuesong Yang <[email protected]>

* [lhotse_dataloader] save efforts to set up tokenizer again for training since it has been setup ready during model initialization.

Signed-off-by: Xuesong Yang <[email protected]>

* [lhotse_dataloader] switch to setup tokenizer inside __getitem__ to support spawn worker processes.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpietts][lhotse] fixed a bug of attatch_tensor which save wrong numpy array. update yaml config

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][lhotse_config] enforce quadratic_duration if using lhotse dataloader to avoid frequent OOMs.
changed yaml name to monologue

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][example] add LR logger.

Signed-off-by: Xuesong Yang <[email protected]>

* cleanup

Signed-off-by: Xuesong Yang <[email protected]>

* [lhotse_yaml] made changes for yaml config according to comments.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][lhotse_dataset] added docstring for lhotse dataset

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][lhotse_dataset] remove yamls

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][lhotse_dataset] remove Edresson's lhotse implementations, and update yaml name.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][lhotse_dataset] add a README showing guidance how to create lhotse data

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][lhotse_dataset] update MonoCut example.

Signed-off-by: Xuesong Yang <[email protected]>

* rename config

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Jason <[email protected]>
* add fix to infer script

Signed-off-by: Jason <[email protected]>

* add no context option

Signed-off-by: Jason <[email protected]>

* add nemo option to infer script

Signed-off-by: Jason <[email protected]>

* add in latest bf16 changes from Edresson

Signed-off-by: Jason <[email protected]>

* add comment

Signed-off-by: Jason <[email protected]>

* enforce codec precision for now

Signed-off-by: Jason <[email protected]>

* fix autocast bug

Signed-off-by: Jason <[email protected]>

* another bug fix

Signed-off-by: Jason <[email protected]>

* clean PR

Signed-off-by: Jason <[email protected]>

* change hardcoded epsilon

Signed-off-by: Jason <[email protected]>

* infer changes

Signed-off-by: Jason <[email protected]>

* address review

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
* bug fix in context text embedding initialization

Signed-off-by: Paarth Neekhara <[email protected]>

* bug fixes in infer and evaluate

Signed-off-by: Paarth Neekhara <[email protected]>

---------

Signed-off-by: Paarth Neekhara <[email protected]>
Signed-off-by: Paarth Neekhara <[email protected]>
Make sure to reserve enough tokens for special uses like EOS/BOS.

WARNING: old models will be incompatible with the updated inference YAMLs
and will need to override the num_audio_tokens_per_codebook to the value they were
trained with.
#51)

* preference optimization updates, trainer updates remove redundant datagen class

Signed-off-by: Shehzeen Hussain <[email protected]>

* revert model pt change, add freeze_model function

Signed-off-by: Shehzeen Hussain <[email protected]>

* remove redundant inference class

Signed-off-by: Shehzeen Hussain <[email protected]>

* remove custom freeze model function and use lightning inbuilt freeze instead

Signed-off-by: Shehzeen Hussain <[email protected]>

* added a readme for magpie preference optimization

Signed-off-by: Shehzeen Hussain <[email protected]>

* change class name from MagpieTTSModelInference to MagpieTTSModelPrefDataGen

Signed-off-by: Shehzeen Hussain <[email protected]>

* update class name from MagpieTTSModelPrefDataGen to MagpieTTSModelOfflinePODataGen

Signed-off-by: Shehzeen Hussain <[email protected]>

---------

Signed-off-by: Shehzeen Hussain <[email protected]>
…codes (#66)

* [magpie][wandb] add loggings for pad ratios for text tokens and audio codes.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][wandb] fix pad ratio calculation

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
…codes (#66)

* [magpie][wandb] add loggings for pad ratios for text tokens and audio codes.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][wandb] fix pad ratio calculation

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
* Bugfix: num_audio_tokens_per_codebook

Make sure to reserve enough tokens for special uses like EOS/BOS.

WARNING: old models will be incompatible with the updated inference YAMLs
and will need to override the num_audio_tokens_per_codebook to the value they were
trained with.

* Rework how number of codes and codebooks are handled (WIP)

* Reorder the code a bit for clarity

* Refactor codebook configuration

* read codec parameters from codec checkpoint; remove corresponding configuration from Magpie YAML files
* add mechanism for backward compatibility with older checkpoints:
** If using `infer_and_evaluate.py`, just set the --legacy_codebooks command line flag
** If running training or inference with the Hydra command line, override using the following flags:
```
forced_num_all_tokens_per_codebook: 2048
forced_audio_bos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -1}           # 2047
forced_audio_eos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -2}           # 2046
forced_context_audio_bos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -4}   # 2044
forced_context_audio_eos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -3}   # 2045
```

* Add README on the codebook reorganization

... and how to load legacy checkpoints.

* Cleanup

* Cleanup and fixing typos

* Cleanup

* Cleanup

* Clarify the README on the embedding table layout

* README cleanup

* Rename an attritube for clarity

codec_model_downsample_factor --> codec_model_samples_per_frame
…nd image on the sliding bar instead of incrementing by 1. (#61)

* [magpie][wandb][bugfix] ensure consistent validation step for audio and image on the sliding bar instead of incrementing by 1.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpietts][loggers] support logging metrics using multiple loggers enabled in exp_manager.

* [magpietts][lhotse_dataset] remove useless imports and functions.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
* Refine the README on codebook layout updates

* Typo fix

* Bugfix: wire in the `legacy_codebooks` flag in a missing place
* add update config to infer script

Signed-off-by: Jason <[email protected]>

* Update infer_and_evaluate.py

---------

Signed-off-by: Jason <[email protected]>
@github-actions github-actions bot removed the Run CICD label Oct 30, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces MagpieTTS, a text-to-speech model with support for training, inference, evaluation, and preference optimization. The changes include:

  • Core MagpieTTS model implementation and preference optimization variants
  • Comprehensive evaluation and inference scripts with metric computation (CER, WER, SSIM, UTMOSv2, FCD)
  • Lhotse dataset integration for efficient data processing and sharding
  • Test coverage for transformer modules, FCD metrics, and Lhotse filters
  • Utility scripts for data preparation, context audio extraction, and codec processing

Reviewed Changes

Copilot reviewed 61 out of 62 changed files in this pull request and generated 29 comments.

Show a summary per file
File Description
tests/collections/tts/modules/test_fcd_metric.py Adds comprehensive unit tests for Frechet Codec Distance metric
tests/collections/tts/modules/test_transformer_2501.py Updates transformer tests to include mask parameters and adds batched inference tests
tests/collections/common/test_lhotse_tts_filters.py Adds tests for Lhotse dataset filters (CER, speaker similarity, validation status)
tests/collections/common/test_lhotse_dataloading.py Removes duplicate test function
scripts/magpietts/*.py Adds evaluation, inference, data preparation, and codec extraction scripts
scripts/magpietts/dpo/*.py Adds DPO/RPO preference pair creation scripts
nemo/collections/tts/modules/utmosv2.py Adds UTMOSv2 MOS estimation wrapper
nemo/collections/tts/modules/encodec_modules.py Adds properties for num_codebooks and codebook_size
nemo/utils/nemo_logging.py Adds stacklevel parameter to logging calls for better source location reporting
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py Fixes typo and improves AggregatedTTSTokenizer implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


@pytest.mark.unit
def test_codebooks_mismatch_update(self, metric, device, codec):
"""Test that the FCD metric doesn't crash when provided with incorrect number ofcodebooks."""
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space between 'of' and 'codebooks' in the docstring.

Copilot uses AI. Check for mistakes.
Comment on lines 1455 to 1459
# @property
# def codebook_size(self):
# """Returns the size of the implicit codebook."""
# return self.codebook_size_per_group**self.num_groups

Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# @property
# def codebook_size(self):
# """Returns the size of the implicit codebook."""
# return self.codebook_size_per_group**self.num_groups

Copilot uses AI. Check for mistakes.
import os
import random
import re
import time
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'time' is not used.

Copilot uses AI. Check for mistakes.
'alignment_loss': alignment_loss,
}

def training_step(self, batch, batch_idx):
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is shadowed by attribute training_step in superclass ModelPT.

Copilot uses AI. Check for mistakes.
'batch_metrics': generated_codes_and_metrics['metrics'],
}

def training_step(self, batch, batch_idx):
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is shadowed by attribute training_step in superclass ModelPT.

Suggested change
def training_step(self, batch, batch_idx):
def ptl_training_step(self, batch, batch_idx):

Copilot uses AI. Check for mistakes.
'text': context_tensors['text'],
'text_lens': context_tensors['text_lens'],
'context_audio_codes': context_tensors['context_audio_codes'],
'context_audio_codes_lens': context_tensors['context_audio_codes_lens'],
'dec_context_size': dec_context_size,
'aligner_attn_soft': aligner_attn_soft,
'aligner_attn_hard': aligner_attn_hard,
}

def training_step(self, batch, batch_idx):
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is shadowed by attribute training_step in superclass ModelPT.

Copilot uses AI. Check for mistakes.
print("...Making Shars")
out_shar_dir = Path(out_shar_dir)
out_shar_dir.mkdir(parents=True, exist_ok=True)
shard_size = shard_size
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment assigns a variable to itself.

Copilot uses AI. Check for mistakes.
num_audio_samples = num_codec_frames * self.codec_model_samples_per_frame
return num_audio_samples

def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List]]:

Check notice

Code scanning / CodeQL

Non-standard exception raised in special method Note

This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.

Copilot Autofix

AI 5 days ago

To adhere to Python conventions for __getitem__, you should change the exception type in line 232 from ValueError to KeyError. This involves editing the specific line:

  • File: nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
  • Region: within the __getitem__ method, specifically at item access check (lines 230–232).
  • Change: Instead of raise ValueError(...), use raise KeyError(...).

No additional imports are required; KeyError is a built-in exception. No additional definitions or code changes needed.

Suggested changeset 1
nemo/collections/tts/data/text_to_speech_dataset_lhotse.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py b/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
--- a/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
+++ b/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
@@ -229,7 +229,7 @@
         for cut in cuts:
             speaker = cut.supervisions[0].speaker
             if not check_speaker_format(speaker):
-                raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
+                raise KeyError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
             dataset_name = speaker.strip().split()[2].split(":")[-1]
             dataset_name_list.append(dataset_name)
 
EOF
@@ -229,7 +229,7 @@
for cut in cuts:
speaker = cut.supervisions[0].speaker
if not check_speaker_format(speaker):
raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
raise KeyError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
dataset_name = speaker.strip().split()[2].split(":")[-1]
dataset_name_list.append(dataset_name)

Copilot is powered by AI and may make mistakes. Always verify output.
self.target_sample_rate = target_sample_rate
self.codec_model_samples_per_frame = codec_model_samples_per_frame

def __getitem__(self, cuts: CutSet) -> Optional[Dict[str, Any]]:

Check notice

Code scanning / CodeQL

Non-standard exception raised in special method Note

This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.
This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.
This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.
This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.

Copilot Autofix

AI 5 days ago

To fix this problem, we should change all cases where ValueError is raised in the __getitem__ method of AudioPairLhotseDataset and instead raise KeyError. This includes the branches where required keys ("shard_origin", "context_recording") are missing from cut.custom, and where "shard_origin" does not match the required pattern for extracting a shard index. Only replace the exceptions in this method; ensure the error message is preserved so debugging remains clear. Only edit within the bounds of the shown code—do not change anything else or add unnecessary imports.

Suggested changeset 1
scripts/magpietts/extend_lhotse_shards_with_audio_codes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py b/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py
--- a/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py
+++ b/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py
@@ -147,17 +147,17 @@
             if not cut.has_custom("shard_origin"):
                 err_msg = f"Cut {cut} is missing required key 'shard_origin'."
                 logging.error(err_msg)
-                raise ValueError(err_msg)
+                raise KeyError(err_msg)
             if not cut.has_custom("context_recording"):
                 err_msg = f"Cut {cut} is missing required key 'context_recording'."
                 logging.error(err_msg)
-                raise ValueError(err_msg)
+                raise KeyError(err_msg)
 
             # Parse shard index from the custom field, handling potential errors
             origin_path = cut.custom["shard_origin"]
             match = re.search(r"cuts\.(\d+)\.jsonl\.gz$", origin_path)
             if match is None:
-                raise ValueError(f"Could not parse shard index from shard_origin: {origin_path}")
+                raise KeyError(f"Could not parse shard index from shard_origin: {origin_path}")
             shard_idx_origin = int(match.group(1))
 
             # audio shape: (num_channels (1), num_samples) -> (num_samples)
EOF
@@ -147,17 +147,17 @@
if not cut.has_custom("shard_origin"):
err_msg = f"Cut {cut} is missing required key 'shard_origin'."
logging.error(err_msg)
raise ValueError(err_msg)
raise KeyError(err_msg)
if not cut.has_custom("context_recording"):
err_msg = f"Cut {cut} is missing required key 'context_recording'."
logging.error(err_msg)
raise ValueError(err_msg)
raise KeyError(err_msg)

# Parse shard index from the custom field, handling potential errors
origin_path = cut.custom["shard_origin"]
match = re.search(r"cuts\.(\d+)\.jsonl\.gz$", origin_path)
if match is None:
raise ValueError(f"Could not parse shard index from shard_origin: {origin_path}")
raise KeyError(f"Could not parse shard index from shard_origin: {origin_path}")
shard_idx_origin = int(match.group(1))

# audio shape: (num_channels (1), num_samples) -> (num_samples)
Copilot is powered by AI and may make mistakes. Always verify output.
@blisc blisc marked this pull request as ready for review November 4, 2025 19:24
@blisc blisc closed this Nov 4, 2025
@github-actions github-actions bot removed the Run CICD label Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants