Update MagpieTTS model with latest changes #15010

blisc · 2025-10-30T19:44:56Z

What does this PR do ?

Updates MagpieTTS with latest dev changes.

Collection: tts

Changelog

Updates MagpieTTS codebase

moved t5tts script to magpietts Signed-off-by: Xuesong Yang <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]>

* wip Signed-off-by: Paarth Neekhara <[email protected]> * attn prior inference implementation Signed-off-by: Paarth Neekhara <[email protected]> * more hacks Signed-off-by: Paarth Neekhara <[email protected]> * minor tweaks Signed-off-by: Paarth Neekhara <[email protected]> * clean ups and make text attention strictly monotonic at inference Signed-off-by: Paarth Neekhara <[email protected]> * more updates Signed-off-by: Paarth Neekhara <[email protected]> * minor tweaks Signed-off-by: Paarth Neekhara <[email protected]> * compute head wise attention maps Signed-off-by: Paarth Neekhara <[email protected]> * configurable ctc prior layers during training Signed-off-by: Paarth Neekhara <[email protected]> * log only ctc prior layers on tensorboard Signed-off-by: Paarth Neekhara <[email protected]> * add layerwise logging Signed-off-by: Paarth Neekhara <[email protected]> * more configurable inference Signed-off-by: Paarth Neekhara <[email protected]> * more conifigs Signed-off-by: Paarth Neekhara <[email protected]> * updated end prediction logic as per discussion with roy Signed-off-by: Paarth Neekhara <[email protected]> * DPO preference pair creations: add option to choose min length * Cleanup * handle cases where predicted codes are very small, havent tested but should work Signed-off-by: Paarth Neekhara <[email protected]> * undo predicted len change since it is not needed Signed-off-by: Paarth Neekhara <[email protected]> * clean up notebook Signed-off-by: Paarth Neekhara <[email protected]> --------- Signed-off-by: Paarth Neekhara <[email protected]> Co-authored-by: Fejgin, Roy <[email protected]>

When doing Pareto ranking make sure to only compare indices that correspond to metrics.

…#52) Updated by Jason * local transformer training tested, prediction not tested Signed-off-by: Paarth Neekhara <[email protected]> * local transformer updates Signed-off-by: Paarth Neekhara <[email protected]> * local transformer inference working Signed-off-by: Paarth Neekhara <[email protected]> * aligner module Signed-off-by: Paarth Neekhara <[email protected]> * aligner module updates Signed-off-by: Paarth Neekhara <[email protected]> * wip Signed-off-by: Paarth Neekhara <[email protected]> * wip Signed-off-by: Paarth Neekhara <[email protected]> * change aligner text input to encoder output Signed-off-by: Paarth Neekhara <[email protected]> * obtain hard alignment from t5tts decoder Signed-off-by: Paarth Neekhara <[email protected]> * log hard attention training Signed-off-by: Paarth Neekhara <[email protected]> * binarization method, obtain_prior_from_cross_attn fix Signed-off-by: Paarth Neekhara <[email protected]> * added configs for local transformer and alignment encoder Signed-off-by: Paarth Neekhara <[email protected]> * added prior window decay factors Signed-off-by: Paarth Neekhara <[email protected]> * more configs.. Signed-off-by: Paarth Neekhara <[email protected]> * config was missing Signed-off-by: Paarth Neekhara <[email protected]> * slight modification in alignment encoder computation, pass target audio embeddings (removing bos) Signed-off-by: Paarth Neekhara <[email protected]> * some comments Signed-off-by: Paarth Neekhara <[email protected]> * prior prob configurable Signed-off-by: Paarth Neekhara <[email protected]> * update yamls Signed-off-by: Paarth Neekhara <[email protected]> * refactor inference prior code Signed-off-by: Paarth Neekhara <[email protected]> * set prior epsilon to 0 to avoid any attention scores on unintended parts Signed-off-by: Paarth Neekhara <[email protected]> * make prior epsilon configurable in training Signed-off-by: Paarth Neekhara <[email protected]> * added rtf metrics and notebook, infer and evaluate changes Signed-off-by: Paarth Neekhara <[email protected]> * turn off alignment encoder training after 50k steps Signed-off-by: Paarth Neekhara <[email protected]> --------- Signed-off-by: Paarth Neekhara <[email protected]>

Updated by Jason, added back inference class * wavlm speaker eval Signed-off-by: Shehzeen Hussain <[email protected]> * connect to inference script Signed-off-by: Shehzeen Hussain <[email protected]> * bug fix Signed-off-by: Shehzeen Hussain <[email protected]> * grpo started, training seems to be working Signed-off-by: Shehzeen Hussain <[email protected]> * grpo local training seems ok Signed-off-by: Shehzeen Hussain <[email protected]> * only one generation per item in val Signed-off-by: Shehzeen Hussain <[email protected]> * allow cfg use during generation process Signed-off-by: Shehzeen Hussain <[email protected]> * fix cer threshold for 0 reward Signed-off-by: Shehzeen Hussain <[email protected]> * use kv cache for grpo generation Signed-off-by: Shehzeen Hussain <[email protected]> * remove kv cache for now Signed-off-by: Shehzeen Hussain <[email protected]> * kv cache for online po configurable Signed-off-by: Shehzeen Hussain <[email protected]> * configurable reward params Signed-off-by: Shehzeen Hussain <[email protected]> * grpo val set added in evalset Signed-off-by: Shehzeen Hussain <[email protected]> * comments update Signed-off-by: Shehzeen Hussain <[email protected]> * modify reward scaling Signed-off-by: Shehzeen Hussain <[email protected]> * moved preference optimization code and classes to a new file Signed-off-by: Shehzeen Hussain <[email protected]> * missing file Signed-off-by: Shehzeen Hussain <[email protected]> * added language option in online PO Signed-off-by: Shehzeen Hussain <[email protected]> * some updates in the script Signed-off-by: Shehzeen Hussain <[email protected]> * add reference free option Signed-off-by: Shehzeen Hussain <[email protected]> * handle corner cases Signed-off-by: Shehzeen Hussain <[email protected]> * bug fix in reference free mode and torch.load fix for new container Signed-off-by: Shehzeen Hussain <[email protected]> * added option for pesq reward Signed-off-by: Shehzeen Hussain <[email protected]> * pesq device bug fix Signed-off-by: Shehzeen Hussain <[email protected]> --------- Signed-off-by: Shehzeen Hussain <[email protected]>

Signed-off-by: Shehzeen Hussain <[email protected]>

* add back missing dev files Signed-off-by: Jason <[email protected]> * more bug fixes from merge Signed-off-by: Jason <[email protected]> * add latest changes for rc5 docker Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]>

…to make the recipe working with PTL 1.9+. (#47) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

…. modify and make it optional. (#49) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

…er. (#52) * structured both loggers for train/val/test. * enable `resume` param to ensure the resumed training logs being merged on the previous run id. * removed `tb_logger` func. Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

* [magpietts] minor fix for the usage of freezing a model. Signed-off-by: Xuesong Yang <[email protected]> * fixed a typo. Signed-off-by: Xuesong Yang <[email protected]> * Apply suggestions from code review --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Jason <[email protected]>

Signed-off-by: Jason <[email protected]>

) * trainer import fix for new pytorch lightning Signed-off-by: Paarth Neekhara <[email protected]> * handle strict prior window correctly Signed-off-by: Paarth Neekhara <[email protected]> * disable autocasting codec model and making prior window strict Signed-off-by: Paarth Neekhara <[email protected]> --------- Signed-off-by: Paarth Neekhara <[email protected]>

…oader. (#54) * [magpie][lhotse] added a lhotse dataloader for monologue tts. this is a working recipe with num_workers>0 for training and num_workers=0 for val datasets. Still faced issues when num_workers>0 during validation steps. Investigating rootcauses. * all contents in a batch are obtained correctly, but dtype mismatches. * fix dtype for text tokens and codec codes. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> * [lhotse_shar_prep] add script to create shar dataset. * with more efficient changes. * bugfix previously the last batch would be dropped if the size is less than the buffer size. this fixes it. Signed-off-by: Xuesong Yang <[email protected]> * [lhotse_dataloader] clean up commented lines. Signed-off-by: Xuesong Yang <[email protected]> * [lhotse_dataloader] bugfix to force spawn over fork to address CUDA initialization errors when multiple workers are used during validation. Signed-off-by: Xuesong Yang <[email protected]> * [lhotse_dataloader] save efforts to set up tokenizer again for training since it has been setup ready during model initialization. Signed-off-by: Xuesong Yang <[email protected]> * [lhotse_dataloader] switch to setup tokenizer inside __getitem__ to support spawn worker processes. Signed-off-by: Xuesong Yang <[email protected]> * [magpietts][lhotse] fixed a bug of attatch_tensor which save wrong numpy array. update yaml config Signed-off-by: Xuesong Yang <[email protected]> * [magpie][lhotse_config] enforce quadratic_duration if using lhotse dataloader to avoid frequent OOMs. changed yaml name to monologue Signed-off-by: Xuesong Yang <[email protected]> * [magpie][example] add LR logger. Signed-off-by: Xuesong Yang <[email protected]> * cleanup Signed-off-by: Xuesong Yang <[email protected]> * [lhotse_yaml] made changes for yaml config according to comments. Signed-off-by: Xuesong Yang <[email protected]> * [magpie][lhotse_dataset] added docstring for lhotse dataset Signed-off-by: Xuesong Yang <[email protected]> * [magpie][lhotse_dataset] remove yamls Signed-off-by: Xuesong Yang <[email protected]> * [magpie][lhotse_dataset] remove Edresson's lhotse implementations, and update yaml name. Signed-off-by: Xuesong Yang <[email protected]> * [magpie][lhotse_dataset] add a README showing guidance how to create lhotse data Signed-off-by: Xuesong Yang <[email protected]> * [magpie][lhotse_dataset] update MonoCut example. Signed-off-by: Xuesong Yang <[email protected]> * rename config Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Jason <[email protected]>

* add fix to infer script Signed-off-by: Jason <[email protected]> * add no context option Signed-off-by: Jason <[email protected]> * add nemo option to infer script Signed-off-by: Jason <[email protected]> * add in latest bf16 changes from Edresson Signed-off-by: Jason <[email protected]> * add comment Signed-off-by: Jason <[email protected]> * enforce codec precision for now Signed-off-by: Jason <[email protected]> * fix autocast bug Signed-off-by: Jason <[email protected]> * another bug fix Signed-off-by: Jason <[email protected]> * clean PR Signed-off-by: Jason <[email protected]> * change hardcoded epsilon Signed-off-by: Jason <[email protected]> * infer changes Signed-off-by: Jason <[email protected]> * address review Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]>

* bug fix in context text embedding initialization Signed-off-by: Paarth Neekhara <[email protected]> * bug fixes in infer and evaluate Signed-off-by: Paarth Neekhara <[email protected]> --------- Signed-off-by: Paarth Neekhara <[email protected]>

Signed-off-by: Jason <[email protected]>

Signed-off-by: Paarth Neekhara <[email protected]>

Make sure to reserve enough tokens for special uses like EOS/BOS. WARNING: old models will be incompatible with the updated inference YAMLs and will need to override the num_audio_tokens_per_codebook to the value they were trained with.

Signed-off-by: Ryan <[email protected]>

Signed-off-by: Shehzeen Hussain <[email protected]>

#51) * preference optimization updates, trainer updates remove redundant datagen class Signed-off-by: Shehzeen Hussain <[email protected]> * revert model pt change, add freeze_model function Signed-off-by: Shehzeen Hussain <[email protected]> * remove redundant inference class Signed-off-by: Shehzeen Hussain <[email protected]> * remove custom freeze model function and use lightning inbuilt freeze instead Signed-off-by: Shehzeen Hussain <[email protected]> * added a readme for magpie preference optimization Signed-off-by: Shehzeen Hussain <[email protected]> * change class name from MagpieTTSModelInference to MagpieTTSModelPrefDataGen Signed-off-by: Shehzeen Hussain <[email protected]> * update class name from MagpieTTSModelPrefDataGen to MagpieTTSModelOfflinePODataGen Signed-off-by: Shehzeen Hussain <[email protected]> --------- Signed-off-by: Shehzeen Hussain <[email protected]>

…codes (#66) * [magpie][wandb] add loggings for pad ratios for text tokens and audio codes. Signed-off-by: Xuesong Yang <[email protected]> * [magpie][wandb] fix pad ratio calculation Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

* Bugfix: num_audio_tokens_per_codebook Make sure to reserve enough tokens for special uses like EOS/BOS. WARNING: old models will be incompatible with the updated inference YAMLs and will need to override the num_audio_tokens_per_codebook to the value they were trained with. * Rework how number of codes and codebooks are handled (WIP) * Reorder the code a bit for clarity * Refactor codebook configuration * read codec parameters from codec checkpoint; remove corresponding configuration from Magpie YAML files * add mechanism for backward compatibility with older checkpoints: ** If using `infer_and_evaluate.py`, just set the --legacy_codebooks command line flag ** If running training or inference with the Hydra command line, override using the following flags: ``` forced_num_all_tokens_per_codebook: 2048 forced_audio_bos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -1} # 2047 forced_audio_eos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -2} # 2046 forced_context_audio_bos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -4} # 2044 forced_context_audio_eos_id: ${sum:${model.forced_num_all_tokens_per_codebook}, -3} # 2045 ``` * Add README on the codebook reorganization ... and how to load legacy checkpoints. * Cleanup * Cleanup and fixing typos * Cleanup * Cleanup * Clarify the README on the embedding table layout * README cleanup * Rename an attritube for clarity codec_model_downsample_factor --> codec_model_samples_per_frame

…nd image on the sliding bar instead of incrementing by 1. (#61) * [magpie][wandb][bugfix] ensure consistent validation step for audio and image on the sliding bar instead of incrementing by 1. Signed-off-by: Xuesong Yang <[email protected]> * [magpietts][loggers] support logging metrics using multiple loggers enabled in exp_manager. * [magpietts][lhotse_dataset] remove useless imports and functions. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

* Refine the README on codebook layout updates * Typo fix * Bugfix: wire in the `legacy_codebooks` flag in a missing place

* add update config to infer script Signed-off-by: Jason <[email protected]> * Update infer_and_evaluate.py --------- Signed-off-by: Jason <[email protected]>

Copilot

Pull Request Overview

This PR introduces MagpieTTS, a text-to-speech model with support for training, inference, evaluation, and preference optimization. The changes include:

Core MagpieTTS model implementation and preference optimization variants
Comprehensive evaluation and inference scripts with metric computation (CER, WER, SSIM, UTMOSv2, FCD)
Lhotse dataset integration for efficient data processing and sharding
Test coverage for transformer modules, FCD metrics, and Lhotse filters
Utility scripts for data preparation, context audio extraction, and codec processing

Reviewed Changes

Copilot reviewed 61 out of 62 changed files in this pull request and generated 29 comments.

Show a summary per file

File	Description
tests/collections/tts/modules/test_fcd_metric.py	Adds comprehensive unit tests for Frechet Codec Distance metric
tests/collections/tts/modules/test_transformer_2501.py	Updates transformer tests to include mask parameters and adds batched inference tests
tests/collections/common/test_lhotse_tts_filters.py	Adds tests for Lhotse dataset filters (CER, speaker similarity, validation status)
tests/collections/common/test_lhotse_dataloading.py	Removes duplicate test function
scripts/magpietts/*.py	Adds evaluation, inference, data preparation, and codec extraction scripts
scripts/magpietts/dpo/*.py	Adds DPO/RPO preference pair creation scripts
nemo/collections/tts/modules/utmosv2.py	Adds UTMOSv2 MOS estimation wrapper
nemo/collections/tts/modules/encodec_modules.py	Adds properties for num_codebooks and codebook_size
nemo/utils/nemo_logging.py	Adds stacklevel parameter to logging calls for better source location reporting
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py	Fixes typo and improves AggregatedTTSTokenizer implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-30T20:41:58Z

tests/collections/tts/modules/test_fcd_metric.py

+
+    @pytest.mark.unit
+    def test_codebooks_mismatch_update(self, metric, device, codec):
+        """Test that the FCD metric doesn't crash when provided with incorrect number ofcodebooks."""


Missing space between 'of' and 'codebooks' in the docstring.

Copilot · 2025-10-30T20:41:58Z

nemo/collections/tts/modules/audio_codec_modules.py

+    # @property
+    # def codebook_size(self):
+    #    """Returns the size of the implicit codebook."""
+    #    return self.codebook_size_per_group**self.num_groups
+


This comment appears to contain commented-out code.

Suggested change

# @property

# def codebook_size(self):

# """Returns the size of the implicit codebook."""

# return self.codebook_size_per_group**self.num_groups

nemo/collections/tts/models/magpietts.py

scripts/magpietts/infer_and_evaluate.py

Copilot · 2025-10-30T20:42:04Z

scripts/magpietts/extend_nemo_manifest_with_context_audio.py

+import os
+import random
+import re
+import time


Import of 'time' is not used.

Copilot · 2025-10-30T20:42:04Z

nemo/collections/tts/models/magpietts_preference_optimization.py

+            'alignment_loss': alignment_loss,
+        }
+
+    def training_step(self, batch, batch_idx):


This method is shadowed by attribute training_step in superclass ModelPT.

Copilot · 2025-10-30T20:42:04Z

nemo/collections/tts/models/magpietts_preference_optimization.py

+            'batch_metrics': generated_codes_and_metrics['metrics'],
+        }
+
+    def training_step(self, batch, batch_idx):


This method is shadowed by attribute training_step in superclass ModelPT.

Suggested change

def training_step(self, batch, batch_idx):

def ptl_training_step(self, batch, batch_idx):

Copilot · 2025-10-30T20:42:05Z

nemo/collections/tts/models/magpietts.py

            'text': context_tensors['text'],
            'text_lens': context_tensors['text_lens'],
            'context_audio_codes': context_tensors['context_audio_codes'],
            'context_audio_codes_lens': context_tensors['context_audio_codes_lens'],
            'dec_context_size': dec_context_size,
+            'aligner_attn_soft': aligner_attn_soft,
+            'aligner_attn_hard': aligner_attn_hard,
        }

    def training_step(self, batch, batch_idx):


This method is shadowed by attribute training_step in superclass ModelPT.

Copilot · 2025-10-30T20:42:05Z

scripts/tts_dataset_to_lhotse/create_shars.py

+    print("...Making Shars")
+    out_shar_dir = Path(out_shar_dir)
+    out_shar_dir.mkdir(parents=True, exist_ok=True)
+    shard_size = shard_size


This assignment assigns a variable to itself.

Signed-off-by: Jason <[email protected]>

nemo/collections/tts/data/text_to_speech_dataset_lhotse.py

+        num_audio_samples = num_codec_frames * self.codec_model_samples_per_frame
+        return num_audio_samples
+
+    def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List]]:


To adhere to Python conventions for __getitem__, you should change the exception type in line 232 from ValueError to KeyError. This involves editing the specific line:

File: nemo/collections/tts/data/text_to_speech_dataset_lhotse.py

Region: within the __getitem__ method, specifically at item access check (lines 230–232).

Change: Instead of raise ValueError(...), use raise KeyError(...).

No additional imports are required; KeyError is a built-in exception. No additional definitions or code changes needed.

nemo/collections/tts/models/magpietts_preference_optimization.py

scripts/magpietts/evaluate_generated_audio.py

scripts/magpietts/extend_lhotse_shards_with_audio_codes.py

+        self.target_sample_rate = target_sample_rate
+        self.codec_model_samples_per_frame = codec_model_samples_per_frame
+
+    def __getitem__(self, cuts: CutSet) -> Optional[Dict[str, Any]]:


To fix this problem, we should change all cases where ValueError is raised in the __getitem__ method of AudioPairLhotseDataset and instead raise KeyError. This includes the branches where required keys ("shard_origin", "context_recording") are missing from cut.custom, and where "shard_origin" does not match the required pattern for extracting a shard index. Only replace the exceptions in this method; ensure the error message is preserved so debugging remains clear. Only edit within the bounds of the shown code—do not change anything else or add unnecessary imports.

scripts/magpietts/infer_and_evaluate.py

… RL; remove some experimental flags Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

Signed-off-by: Jason <[email protected]>

…s_main

Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

…ypes from magpie and Clean up scripts Signed-off-by: Jason <[email protected]>

…s_main

Signed-off-by: Jason <[email protected]>

XuesongYang and others added 30 commits March 17, 2025 11:23

add repeat index to help saving pred audio files for each repeat. (#50)

e3c9234

moved t5tts script to magpietts Signed-off-by: Xuesong Yang <[email protected]>

fix: make confidence level configurable. (#51)

d174786

Signed-off-by: Xuesong Yang <[email protected]>

Bugfix in DPO Pareto ranking (#53)

6f475c7

When doing Pareto ranking make sure to only compare indices that correspond to metrics.

bug fixes after merge

c94d356

Signed-off-by: Shehzeen Hussain <[email protected]>

[bugfix][magpietts] replace pytorch_lightning with lightning.pytorch …

3791bcd

…to make the recipe working with PTL 1.9+. (#47) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

[magpietts] enable pin_memory to enable fast data transfer to GPU. (#48)

6ebde25

Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

[bugfix][tts_dataset] feature_dir is not a required key for magpietts…

bc1bcfb

…. modify and make it optional. (#49) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

undo unintended change in #45 (#53)

21bf9c9

Signed-off-by: Jason <[email protected]>

update defaul params in config (#59)

7e2cdca

Signed-off-by: Jason <[email protected]>

magpie top k bug fix (#60)

aae4e60

Signed-off-by: Paarth Neekhara <[email protected]>

Bugfix: num_audio_tokens_per_codebook (#62)

23e299a

Make sure to reserve enough tokens for special uses like EOS/BOS. WARNING: old models will be incompatible with the updated inference YAMLs and will need to override the num_audio_tokens_per_codebook to the value they were trained with.

Add num_codebooks and codebook_size to codec interface (#65)

3e1e7fc

Signed-off-by: Ryan <[email protected]>

bug fix in _setup_test_dataloader when num workers=0 (#67)

4b4914e

Signed-off-by: Shehzeen Hussain <[email protected]>

Codebook layout update: bugfix and README refinements (#68)

a100ad1

* Refine the README on codebook layout updates * Typo fix * Bugfix: wire in the `legacy_codebooks` flag in a missing place

add update config to infer script (#70)

db801cb

* add update config to infer script Signed-off-by: Jason <[email protected]> * Update infer_and_evaluate.py --------- Signed-off-by: Jason <[email protected]>

github-actions bot removed the Run CICD label Oct 30, 2025

Copilot AI reviewed Oct 30, 2025

View reviewed changes

address copilot comments

b84d71e

Signed-off-by: Jason <[email protected]>

github-advanced-security bot found potential problems Oct 31, 2025

View reviewed changes

blisc and others added 5 commits November 3, 2025 12:36

remove notebooks; address flake8 comments; add import guard in magpie…

f7230c1

… RL; remove some experimental flags Signed-off-by: Jason <[email protected]>

Apply isort and black reformatting

2338adb

Signed-off-by: blisc <[email protected]>

remove experimental imports

c78bc75

Signed-off-by: Jason <[email protected]>

Merge branch 'magpietts_main' of github.com:NVIDIA/NeMo into magpiett…

03949ea

…s_main

more flake8

8e581a3

Signed-off-by: Jason <[email protected]>

blisc mentioned this pull request Nov 4, 2025

Fix MagpieTTS_ModelInference process_text: str.replace() doesn't modify in-place #15028

Closed

8 tasks

blisc and others added 7 commits November 4, 2025 07:09

Add in fix from #15028 by @matteolippi

4920950

Signed-off-by: Jason <[email protected]>

attempt to address some codeQL issues

9b99d2d

Signed-off-by: Jason <[email protected]>

fix typo

0f4045a

Signed-off-by: Jason <[email protected]>

Apply isort and black reformatting

311a6fc

Signed-off-by: blisc <[email protected]>

Remove single_encoder_sv_tts and decoder_pretrain_synthesizer model t…

79b0eca

…ypes from magpie and Clean up scripts Signed-off-by: Jason <[email protected]>

Merge branch 'magpietts_main' of github.com:NVIDIA/NeMo into magpiett…

09c3900

…s_main

CodeQL and Lint fixes

8c732c5

Signed-off-by: Jason <[email protected]>

blisc added the Run CICD label Nov 4, 2025

blisc had a problem deploying to test November 4, 2025 17:31 — with GitHub Actions Error

Update confs and readmes

1acbc8f

Signed-off-by: Jason <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Nov 4, 2025

blisc marked this pull request as ready for review November 4, 2025 19:24

blisc requested review from chtruong814, ko3n1g, pablo-garay and thomasdhc as code owners November 4, 2025 19:24

chtruong814 temporarily deployed to test November 4, 2025 19:26 — with GitHub Actions Inactive

blisc closed this Nov 4, 2025

github-actions bot removed the Run CICD label Nov 4, 2025

@@ -229,7 +229,7 @@
                     for cut in cuts:
                         speaker = cut.supervisions[0].speaker
                         if not check_speaker_format(speaker):
-                            raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
+                            raise KeyError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
                         dataset_name = speaker.strip().split()[2].split(":")[-1]
                         dataset_name_list.append(dataset_name)

@@ -147,17 +147,17 @@
                         if not cut.has_custom("shard_origin"):
                             err_msg = f"Cut {cut} is missing required key 'shard_origin'."
                             logging.error(err_msg)
-                            raise ValueError(err_msg)
+                            raise KeyError(err_msg)
                         if not cut.has_custom("context_recording"):
                             err_msg = f"Cut {cut} is missing required key 'context_recording'."
                             logging.error(err_msg)
-                            raise ValueError(err_msg)
+                            raise KeyError(err_msg)
                         # Parse shard index from the custom field, handling potential errors
                         origin_path = cut.custom["shard_origin"]
                         match = re.search(r"cuts\.(\d+)\.jsonl\.gz$", origin_path)
                         if match is None:
-                            raise ValueError(f"Could not parse shard index from shard_origin: {origin_path}")
+                            raise KeyError(f"Could not parse shard index from shard_origin: {origin_path}")
                         shard_idx_origin = int(match.group(1))
                         # audio shape: (num_channels (1), num_samples) -> (num_samples)

	def training_step(self, batch, batch_idx):
	def ptl_training_step(self, batch, batch_idx):

Update MagpieTTS model with latest changes #15010

Update MagpieTTS model with latest changes #15010

Conversation

blisc commented Oct 30, 2025

What does this PR do ?

Changelog

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Check notice

Uh oh!

Copilot Autofix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants