Update MagpieTTS model with latest changes #15031

blisc · 2025-11-04T19:29:11Z

What does this PR do ?

Updates MagpieTTS with latest dev changes.

Collection: tts

Changelog

Updates MagpieTTS codebase

Signed-off-by: Jason <[email protected]>

nemo/collections/tts/data/text_to_speech_dataset_lhotse.py

+        num_audio_samples = num_codec_frames * self.codec_model_samples_per_frame
+        return num_audio_samples
+
+    def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List]]:


To fix the problem, any exception raised directly as a result of missing, invalid, or malformed items inside __getitem__ should use IndexError (or KeyError). In this code, the specific section:

231: if not check_speaker_format(speaker): 232: raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")

should simply be changed to raise IndexError rather than ValueError, as the problem is a failed lookup of a properly-formatted item in the collection. No new methods or imports are needed since IndexError is a built-in exception.

Only this replacement is required.

nemo/collections/tts/modules/fcd_metric.py

+        codes, codes_len = self.model.encode_from_file(audio_path)
+        self.update(codes, codes_len, is_real)
+
+    def update(self, codes: Tensor, codes_len: Tensor, is_real: bool):


To fix the problem, all return statements in the update function should return the same value, namely the instance itself (self). Instead of using bare return statements (which implicitly return None) for the early exits on lines 174 and 180, change them to return self. This ensures that any exit path through the function returns self, making the function's return value consistent, more readable, and less error-prone for callers. Only this function (update in nemo/collections/tts/modules/fcd_metric.py) needs to be modified.

No other imports, method definitions, or variable definitions are required.

scripts/magpietts/extend_lhotse_shards_with_audio_codes.py

+        self.target_sample_rate = target_sample_rate
+        self.codec_model_samples_per_frame = codec_model_samples_per_frame
+
+    def __getitem__(self, cuts: CutSet) -> Optional[Dict[str, Any]]:


To fix this issue, wherever we raise ValueError in the __getitem__ method of AudioPairLhotseDataset due to missing expected keys (shard_origin, context_recording), we should raise KeyError instead, since the error is caused by lookup failure for those keys.

Specifically, replace raise ValueError(err_msg) on lines 150 and 154 with raise KeyError(err_msg).

On line 160, where parsing the shard index from a string fails, it's valid to raise a ValueError since the string format is unexpected (not a lookup failure). So that case should remain as is.

No new methods or imports are required.

Copilot

Pull Request Overview

This PR updates the MagpieTTS model with the latest development changes, including enhanced transformer architecture, new preference optimization methods, improved testing infrastructure, and expanded utility modules for audio codec processing and evaluation.

Key Changes:

Introduced online (GRPO) and offline (DPO/RPO) preference optimization training modes
Enhanced transformer architecture with improved attention mechanisms and masking support
Added comprehensive evaluation scripts and metrics (FCD, UTMOSv2)
Expanded audio codec modules with new quantizers and encoders

Reviewed Changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/functional_tests/*.sh	New functional test scripts for MagpieTTS inference and training modes
tests/collections/tts/modules/test_transformer_2501.py	Added mask parameters and batched inference tests for transformer
tests/collections/tts/modules/test_fcd_metric.py	New tests for Frechet Codec Distance metric
tests/collections/common/test_lhotse_*.py	Tests for Lhotse data filtering and duplicate removal
scripts/magpietts/*.py	New evaluation, inference, and data processing scripts
scripts/magpietts/README_magpie_po.md	Documentation for preference optimization workflows
requirements/requirements_tts.txt	Added UTMOSv2 dependency
nemo/utils/nemo_logging.py	Added stacklevel parameter to logging calls and docstrings
nemo/collections/tts/parts/utils/helpers.py	Enhanced masking with pad_to_factor and attention prior visualization
nemo/collections/tts/parts/utils/callbacks.py	Removed experimental decorator
nemo/collections/tts/parts/preprocessing/*.py	Removed experimental decorators and improved formatting
nemo/collections/tts/modules/*.py	New modules for UTMOSv2, FCD metric, and MagpieTTS components
nemo/collections/tts/modules/transformer_2501.py	Enhanced with masking support and improved attention mechanisms
nemo/collections/tts/modules/encodec_modules.py	Added properties for codebook metadata
nemo/collections/tts/modules/audio_codec_modules.py	Extensive additions including new encoders, decoders, and quantizers
nemo/collections/tts/models/magpietts_preference_optimization.py	New preference optimization model implementations
nemo/collections/tts/models/init.py	Updated imports for renamed MagpieTTS models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-04T19:35:19Z

tests/functional_tests/L2_TTS_InferEvaluate_Magpietts_ZeroShot.sh

+    --cfg_scale 2.5 \
+    --num_repeats 1 \
+    --temperature 0.6 \
+    --hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \


Corrected spelling of 'alignement' to 'alignment'

Suggested change

--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \

--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignment_0.002_priorscale_0.1.yaml \

Copilot · 2025-11-04T19:35:20Z

scripts/magpietts/dpo/create_text_contextpairs.py

+    --n_text_contexts_per_challenging_text 2 \
+    --n_audio_contexts_per_regular_text 1 \
+    --n_text_contexts_per_regular_text 1 \
+    --nsamples_perpair 1 ;


Corrected spelling of 'perpair' to 'per_pair'

Signed-off-by: Jason <[email protected]>

…in parakeet inference to test segmentation fault Signed-off-by: Jason <[email protected]>

chtruong814 · 2025-11-06T16:59:23Z

tests/functional_tests/L2_TTS_Fast_dev_runs_Magpietts_DecoderContext.sh

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+coverage run --branch -a --data-file=/workspace/.coverage --source=/workspace/nemo examples/tts/magpietts.py \


I think the failure is passing in --branch to the scripts here. Please update so that it's coverage run -a.

https://github.com/NVIDIA-NeMo/NeMo/blob/main/tests/functional_tests/L2_Speaker_dev_run_Speech_to_Label.sh

Ah, thanks for that catch!

Signed-off-by: Jason <[email protected]>

…; attempt to catch error Signed-off-by: Jason <[email protected]>

Signed-off-by: Jason <[email protected]>

Update MagpieTTS

22a6ed5

Signed-off-by: Jason <[email protected]>

blisc marked this pull request as ready for review November 4, 2025 19:29

blisc requested review from chtruong814, ko3n1g, pablo-garay and thomasdhc as code owners November 4, 2025 19:29

github-actions bot added TTS CI common labels Nov 4, 2025

blisc added the Run CICD label Nov 4, 2025

blisc temporarily deployed to test November 4, 2025 19:30 — with GitHub Actions Inactive

blisc requested a review from Copilot November 4, 2025 19:34

github-advanced-security bot found potential problems Nov 4, 2025

View reviewed changes

Copilot AI reviewed Nov 4, 2025

View reviewed changes

github-actions bot removed the Run CICD label Nov 5, 2025

allow None in dataset path

0f53824

Signed-off-by: Jason <[email protected]>

blisc mentioned this pull request Nov 5, 2025

Fix MagpieTTS_ModelInference process_text: str.replace() doesn't modify in-place #15028

Closed

8 tasks

blisc added the Run CICD label Nov 5, 2025

blisc temporarily deployed to test November 5, 2025 15:51 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Nov 5, 2025

try to fix test by removing lhotse; fix yamls in fast dev run tests

8c08aa9

Signed-off-by: Jason <[email protected]>

github-actions bot added the ASR label Nov 5, 2025

blisc added the Run CICD label Nov 5, 2025

blisc temporarily deployed to test November 5, 2025 21:17 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Nov 6, 2025

increase zeroshot cer value; attempt to fix PO test; add back lhotse …

2ef4dd9

…in parakeet inference to test segmentation fault Signed-off-by: Jason <[email protected]>

github-actions bot removed the ASR label Nov 6, 2025

blisc added the Run CICD label Nov 6, 2025

blisc temporarily deployed to test November 6, 2025 15:13 — with GitHub Actions Inactive

chtruong814 reviewed Nov 6, 2025

View reviewed changes

github-actions bot removed the Run CICD label Nov 6, 2025

blisc added 3 commits November 6, 2025 13:14

remove branch from test

fa5e574

Signed-off-by: Jason <[email protected]>

use batch_size 1

eaffb27

Signed-off-by: Jason <[email protected]>

update GRPO test script

e81153c

Signed-off-by: Jason <[email protected]>

blisc added the Run CICD label Nov 6, 2025

blisc temporarily deployed to test November 6, 2025 21:35 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Nov 7, 2025

add use_lhotse as a param to transcribe; attempt to fix PO test again…

426b583

…; attempt to catch error Signed-off-by: Jason <[email protected]>

blisc marked this pull request as draft November 7, 2025 14:56

github-actions bot added ASR audio labels Nov 7, 2025

blisc added the Run CICD label Nov 7, 2025

blisc temporarily deployed to test November 7, 2025 14:57 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Nov 7, 2025

fix tests

b4231c9

Signed-off-by: Jason <[email protected]>

blisc added the Run CICD label Nov 7, 2025

blisc temporarily deployed to test November 7, 2025 15:47 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Nov 7, 2025

update rnnt transcribe; fix po test again

b566cb6

Signed-off-by: Jason <[email protected]>

blisc added the Run CICD label Nov 7, 2025

chtruong814 added Run CICD and removed Run CICD labels Nov 7, 2025

chtruong814 temporarily deployed to test November 7, 2025 20:19 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Nov 8, 2025

@@ -229,7 +229,7 @@
                     for cut in cuts:
                         speaker = cut.supervisions[0].speaker
                         if not check_speaker_format(speaker):
-                            raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
+                            raise IndexError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
                         dataset_name = speaker.strip().split()[2].split(":")[-1]
                         dataset_name_list.append(dataset_name)

@@ -171,13 +171,13 @@
                     if codes.numel() == 0:
                         logging.warning("FCD metric received an empty batch of codes - skipping update")
-                        return
+                        return self
                     if codes.shape[1] != self.model.codec.num_codebooks:
                         logging.warning(
                             f"FCD metric received a batch of codes of shape {codes.shape}, but the model has {self.model.codec.num_codebooks} codebooks - skipping update"
                         )
-                        return
+                        return self
                     # Dequantize the codes to a continuous representation
                     embeddings = self.model.codes_to_embedding(

@@ -147,11 +147,11 @@
                         if not cut.has_custom("shard_origin"):
                             err_msg = f"Cut {cut} is missing required key 'shard_origin'."
                             logging.error(err_msg)
-                            raise ValueError(err_msg)
+                            raise KeyError(err_msg)
                         if not cut.has_custom("context_recording"):
                             err_msg = f"Cut {cut} is missing required key 'context_recording'."
                             logging.error(err_msg)
-                            raise ValueError(err_msg)
+                            raise KeyError(err_msg)
                         # Parse shard index from the custom field, handling potential errors
                         origin_path = cut.custom["shard_origin"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update MagpieTTS model with latest changes #15031

Update MagpieTTS model with latest changes #15031

Uh oh!

blisc commented Nov 4, 2025

Uh oh!

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Copilot AI left a comment

Uh oh!

Copilot AI Nov 4, 2025

Uh oh!

Copilot AI Nov 4, 2025

Uh oh!

chtruong814 Nov 6, 2025

Uh oh!

blisc Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \
	--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignment_0.002_priorscale_0.1.yaml \

Update MagpieTTS model with latest changes #15031

Are you sure you want to change the base?

Update MagpieTTS model with latest changes #15031

Uh oh!

Conversation

blisc commented Nov 4, 2025

What does this PR do ?

Changelog

Uh oh!

Check notice

Uh oh!

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

chtruong814 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

blisc Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants