Skip to content

Conversation

@blisc
Copy link
Collaborator

@blisc blisc commented Nov 4, 2025

What does this PR do ?

Updates MagpieTTS with latest dev changes.

Collection: tts

Changelog

  • Updates MagpieTTS codebase

Signed-off-by: Jason <[email protected]>
num_audio_samples = num_codec_frames * self.codec_model_samples_per_frame
return num_audio_samples

def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List]]:

Check notice

Code scanning / CodeQL

Non-standard exception raised in special method Note

This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.

Copilot Autofix

AI 4 days ago

To fix the problem, any exception raised directly as a result of missing, invalid, or malformed items inside __getitem__ should use IndexError (or KeyError). In this code, the specific section:

231:             if not check_speaker_format(speaker):
232:                 raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")

should simply be changed to raise IndexError rather than ValueError, as the problem is a failed lookup of a properly-formatted item in the collection. No new methods or imports are needed since IndexError is a built-in exception.

Only this replacement is required.

Suggested changeset 1
nemo/collections/tts/data/text_to_speech_dataset_lhotse.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py b/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
--- a/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
+++ b/nemo/collections/tts/data/text_to_speech_dataset_lhotse.py
@@ -229,7 +229,7 @@
         for cut in cuts:
             speaker = cut.supervisions[0].speaker
             if not check_speaker_format(speaker):
-                raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
+                raise IndexError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
             dataset_name = speaker.strip().split()[2].split(":")[-1]
             dataset_name_list.append(dataset_name)
 
EOF
@@ -229,7 +229,7 @@
for cut in cuts:
speaker = cut.supervisions[0].speaker
if not check_speaker_format(speaker):
raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
raise IndexError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")
dataset_name = speaker.strip().split()[2].split(":")[-1]
dataset_name_list.append(dataset_name)

Copilot is powered by AI and may make mistakes. Always verify output.
codes, codes_len = self.model.encode_from_file(audio_path)
self.update(codes, codes_len, is_real)

def update(self, codes: Tensor, codes_len: Tensor, is_real: bool):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

Copilot Autofix

AI 4 days ago

To fix the problem, all return statements in the update function should return the same value, namely the instance itself (self). Instead of using bare return statements (which implicitly return None) for the early exits on lines 174 and 180, change them to return self. This ensures that any exit path through the function returns self, making the function's return value consistent, more readable, and less error-prone for callers. Only this function (update in nemo/collections/tts/modules/fcd_metric.py) needs to be modified.

No other imports, method definitions, or variable definitions are required.

Suggested changeset 1
nemo/collections/tts/modules/fcd_metric.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/tts/modules/fcd_metric.py b/nemo/collections/tts/modules/fcd_metric.py
--- a/nemo/collections/tts/modules/fcd_metric.py
+++ b/nemo/collections/tts/modules/fcd_metric.py
@@ -171,13 +171,13 @@
 
         if codes.numel() == 0:
             logging.warning("FCD metric received an empty batch of codes - skipping update")
-            return
+            return self
 
         if codes.shape[1] != self.model.codec.num_codebooks:
             logging.warning(
                 f"FCD metric received a batch of codes of shape {codes.shape}, but the model has {self.model.codec.num_codebooks} codebooks - skipping update"
             )
-            return
+            return self
 
         # Dequantize the codes to a continuous representation
         embeddings = self.model.codes_to_embedding(
EOF
@@ -171,13 +171,13 @@

if codes.numel() == 0:
logging.warning("FCD metric received an empty batch of codes - skipping update")
return
return self

if codes.shape[1] != self.model.codec.num_codebooks:
logging.warning(
f"FCD metric received a batch of codes of shape {codes.shape}, but the model has {self.model.codec.num_codebooks} codebooks - skipping update"
)
return
return self

# Dequantize the codes to a continuous representation
embeddings = self.model.codes_to_embedding(
Copilot is powered by AI and may make mistakes. Always verify output.
self.target_sample_rate = target_sample_rate
self.codec_model_samples_per_frame = codec_model_samples_per_frame

def __getitem__(self, cuts: CutSet) -> Optional[Dict[str, Any]]:

Check notice

Code scanning / CodeQL

Non-standard exception raised in special method Note

This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.
This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.
This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.
This method raises
ValueError
- should raise a LookupError (KeyError or IndexError) instead.

Copilot Autofix

AI 4 days ago

To fix this issue, wherever we raise ValueError in the __getitem__ method of AudioPairLhotseDataset due to missing expected keys (shard_origin, context_recording), we should raise KeyError instead, since the error is caused by lookup failure for those keys.

  • Specifically, replace raise ValueError(err_msg) on lines 150 and 154 with raise KeyError(err_msg).
  • On line 160, where parsing the shard index from a string fails, it's valid to raise a ValueError since the string format is unexpected (not a lookup failure). So that case should remain as is.

No new methods or imports are required.


Suggested changeset 1
scripts/magpietts/extend_lhotse_shards_with_audio_codes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py b/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py
--- a/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py
+++ b/scripts/magpietts/extend_lhotse_shards_with_audio_codes.py
@@ -147,11 +147,11 @@
             if not cut.has_custom("shard_origin"):
                 err_msg = f"Cut {cut} is missing required key 'shard_origin'."
                 logging.error(err_msg)
-                raise ValueError(err_msg)
+                raise KeyError(err_msg)
             if not cut.has_custom("context_recording"):
                 err_msg = f"Cut {cut} is missing required key 'context_recording'."
                 logging.error(err_msg)
-                raise ValueError(err_msg)
+                raise KeyError(err_msg)
 
             # Parse shard index from the custom field, handling potential errors
             origin_path = cut.custom["shard_origin"]
EOF
@@ -147,11 +147,11 @@
if not cut.has_custom("shard_origin"):
err_msg = f"Cut {cut} is missing required key 'shard_origin'."
logging.error(err_msg)
raise ValueError(err_msg)
raise KeyError(err_msg)
if not cut.has_custom("context_recording"):
err_msg = f"Cut {cut} is missing required key 'context_recording'."
logging.error(err_msg)
raise ValueError(err_msg)
raise KeyError(err_msg)

# Parse shard index from the custom field, handling potential errors
origin_path = cut.custom["shard_origin"]
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the MagpieTTS model with the latest development changes, including enhanced transformer architecture, new preference optimization methods, improved testing infrastructure, and expanded utility modules for audio codec processing and evaluation.

Key Changes:

  • Introduced online (GRPO) and offline (DPO/RPO) preference optimization training modes
  • Enhanced transformer architecture with improved attention mechanisms and masking support
  • Added comprehensive evaluation scripts and metrics (FCD, UTMOSv2)
  • Expanded audio codec modules with new quantizers and encoders

Reviewed Changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/functional_tests/*.sh New functional test scripts for MagpieTTS inference and training modes
tests/collections/tts/modules/test_transformer_2501.py Added mask parameters and batched inference tests for transformer
tests/collections/tts/modules/test_fcd_metric.py New tests for Frechet Codec Distance metric
tests/collections/common/test_lhotse_*.py Tests for Lhotse data filtering and duplicate removal
scripts/magpietts/*.py New evaluation, inference, and data processing scripts
scripts/magpietts/README_magpie_po.md Documentation for preference optimization workflows
requirements/requirements_tts.txt Added UTMOSv2 dependency
nemo/utils/nemo_logging.py Added stacklevel parameter to logging calls and docstrings
nemo/collections/tts/parts/utils/helpers.py Enhanced masking with pad_to_factor and attention prior visualization
nemo/collections/tts/parts/utils/callbacks.py Removed experimental decorator
nemo/collections/tts/parts/preprocessing/*.py Removed experimental decorators and improved formatting
nemo/collections/tts/modules/*.py New modules for UTMOSv2, FCD metric, and MagpieTTS components
nemo/collections/tts/modules/transformer_2501.py Enhanced with masking support and improved attention mechanisms
nemo/collections/tts/modules/encodec_modules.py Added properties for codebook metadata
nemo/collections/tts/modules/audio_codec_modules.py Extensive additions including new encoders, decoders, and quantizers
nemo/collections/tts/models/magpietts_preference_optimization.py New preference optimization model implementations
nemo/collections/tts/models/init.py Updated imports for renamed MagpieTTS models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

--cfg_scale 2.5 \
--num_repeats 1 \
--temperature 0.6 \
--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'alignement' to 'alignment'

Suggested change
--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \
--hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignment_0.002_priorscale_0.1.yaml \

Copilot uses AI. Check for mistakes.
--n_text_contexts_per_challenging_text 2 \
--n_audio_contexts_per_regular_text 1 \
--n_text_contexts_per_regular_text 1 \
--nsamples_perpair 1 ;
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'perpair' to 'per_pair'

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot removed the Run CICD label Nov 5, 2025
…in parakeet inference to test segmentation fault

Signed-off-by: Jason <[email protected]>
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
coverage run --branch -a --data-file=/workspace/.coverage --source=/workspace/nemo examples/tts/magpietts.py \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the failure is passing in --branch to the scripts here. Please update so that it's coverage run -a.

https://github.com/NVIDIA-NeMo/NeMo/blob/main/tests/functional_tests/L2_Speaker_dev_run_Speech_to_Label.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for that catch!

@github-actions github-actions bot removed the Run CICD label Nov 6, 2025
@blisc blisc marked this pull request as draft November 7, 2025 14:56
@blisc blisc added the Run CICD label Nov 7, 2025
@github-actions github-actions bot removed the Run CICD label Nov 7, 2025
Signed-off-by: Jason <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants