-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Update MagpieTTS model with latest changes #15031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jason <[email protected]>
| num_audio_samples = num_codec_frames * self.codec_model_samples_per_frame | ||
| return num_audio_samples | ||
|
|
||
| def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List]]: |
Check notice
Code scanning / CodeQL
Non-standard exception raised in special method Note
ValueError
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To fix the problem, any exception raised directly as a result of missing, invalid, or malformed items inside __getitem__ should use IndexError (or KeyError). In this code, the specific section:
231: if not check_speaker_format(speaker):
232: raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}")should simply be changed to raise IndexError rather than ValueError, as the problem is a failed lookup of a properly-formatted item in the collection. No new methods or imports are needed since IndexError is a built-in exception.
Only this replacement is required.
-
Copy modified line R232
| @@ -229,7 +229,7 @@ | ||
| for cut in cuts: | ||
| speaker = cut.supervisions[0].speaker | ||
| if not check_speaker_format(speaker): | ||
| raise ValueError(f"Invalid format in cut.supervisions[0].speaker: {speaker}") | ||
| raise IndexError(f"Invalid format in cut.supervisions[0].speaker: {speaker}") | ||
| dataset_name = speaker.strip().split()[2].split(":")[-1] | ||
| dataset_name_list.append(dataset_name) | ||
|
|
| codes, codes_len = self.model.encode_from_file(audio_path) | ||
| self.update(codes, codes_len, is_real) | ||
|
|
||
| def update(self, codes: Tensor, codes_len: Tensor, is_real: bool): |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To fix the problem, all return statements in the update function should return the same value, namely the instance itself (self). Instead of using bare return statements (which implicitly return None) for the early exits on lines 174 and 180, change them to return self. This ensures that any exit path through the function returns self, making the function's return value consistent, more readable, and less error-prone for callers. Only this function (update in nemo/collections/tts/modules/fcd_metric.py) needs to be modified.
No other imports, method definitions, or variable definitions are required.
-
Copy modified line R174 -
Copy modified line R180
| @@ -171,13 +171,13 @@ | ||
|
|
||
| if codes.numel() == 0: | ||
| logging.warning("FCD metric received an empty batch of codes - skipping update") | ||
| return | ||
| return self | ||
|
|
||
| if codes.shape[1] != self.model.codec.num_codebooks: | ||
| logging.warning( | ||
| f"FCD metric received a batch of codes of shape {codes.shape}, but the model has {self.model.codec.num_codebooks} codebooks - skipping update" | ||
| ) | ||
| return | ||
| return self | ||
|
|
||
| # Dequantize the codes to a continuous representation | ||
| embeddings = self.model.codes_to_embedding( |
| self.target_sample_rate = target_sample_rate | ||
| self.codec_model_samples_per_frame = codec_model_samples_per_frame | ||
|
|
||
| def __getitem__(self, cuts: CutSet) -> Optional[Dict[str, Any]]: |
Check notice
Code scanning / CodeQL
Non-standard exception raised in special method Note
ValueError
This method raises
ValueError
This method raises
ValueError
This method raises
ValueError
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To fix this issue, wherever we raise ValueError in the __getitem__ method of AudioPairLhotseDataset due to missing expected keys (shard_origin, context_recording), we should raise KeyError instead, since the error is caused by lookup failure for those keys.
- Specifically, replace
raise ValueError(err_msg)on lines 150 and 154 withraise KeyError(err_msg). - On line 160, where parsing the shard index from a string fails, it's valid to raise a
ValueErrorsince the string format is unexpected (not a lookup failure). So that case should remain as is.
No new methods or imports are required.
-
Copy modified line R150 -
Copy modified line R154
| @@ -147,11 +147,11 @@ | ||
| if not cut.has_custom("shard_origin"): | ||
| err_msg = f"Cut {cut} is missing required key 'shard_origin'." | ||
| logging.error(err_msg) | ||
| raise ValueError(err_msg) | ||
| raise KeyError(err_msg) | ||
| if not cut.has_custom("context_recording"): | ||
| err_msg = f"Cut {cut} is missing required key 'context_recording'." | ||
| logging.error(err_msg) | ||
| raise ValueError(err_msg) | ||
| raise KeyError(err_msg) | ||
|
|
||
| # Parse shard index from the custom field, handling potential errors | ||
| origin_path = cut.custom["shard_origin"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the MagpieTTS model with the latest development changes, including enhanced transformer architecture, new preference optimization methods, improved testing infrastructure, and expanded utility modules for audio codec processing and evaluation.
Key Changes:
- Introduced online (GRPO) and offline (DPO/RPO) preference optimization training modes
- Enhanced transformer architecture with improved attention mechanisms and masking support
- Added comprehensive evaluation scripts and metrics (FCD, UTMOSv2)
- Expanded audio codec modules with new quantizers and encoders
Reviewed Changes
Copilot reviewed 53 out of 54 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/functional_tests/*.sh | New functional test scripts for MagpieTTS inference and training modes |
| tests/collections/tts/modules/test_transformer_2501.py | Added mask parameters and batched inference tests for transformer |
| tests/collections/tts/modules/test_fcd_metric.py | New tests for Frechet Codec Distance metric |
| tests/collections/common/test_lhotse_*.py | Tests for Lhotse data filtering and duplicate removal |
| scripts/magpietts/*.py | New evaluation, inference, and data processing scripts |
| scripts/magpietts/README_magpie_po.md | Documentation for preference optimization workflows |
| requirements/requirements_tts.txt | Added UTMOSv2 dependency |
| nemo/utils/nemo_logging.py | Added stacklevel parameter to logging calls and docstrings |
| nemo/collections/tts/parts/utils/helpers.py | Enhanced masking with pad_to_factor and attention prior visualization |
| nemo/collections/tts/parts/utils/callbacks.py | Removed experimental decorator |
| nemo/collections/tts/parts/preprocessing/*.py | Removed experimental decorators and improved formatting |
| nemo/collections/tts/modules/*.py | New modules for UTMOSv2, FCD metric, and MagpieTTS components |
| nemo/collections/tts/modules/transformer_2501.py | Enhanced with masking support and improved attention mechanisms |
| nemo/collections/tts/modules/encodec_modules.py | Added properties for codebook metadata |
| nemo/collections/tts/modules/audio_codec_modules.py | Extensive additions including new encoders, decoders, and quantizers |
| nemo/collections/tts/models/magpietts_preference_optimization.py | New preference optimization model implementations |
| nemo/collections/tts/models/init.py | Updated imports for renamed MagpieTTS models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --cfg_scale 2.5 \ | ||
| --num_repeats 1 \ | ||
| --temperature 0.6 \ | ||
| --hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \ |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'alignement' to 'alignment'
| --hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignement_0.002_priorscale_0.1.yaml \ | |
| --hparams_files /home/TestData/tts/2506_ZeroShot/lrhm_short_yt_prioralways_alignment_0.002_priorscale_0.1.yaml \ |
| --n_text_contexts_per_challenging_text 2 \ | ||
| --n_audio_contexts_per_regular_text 1 \ | ||
| --n_text_contexts_per_regular_text 1 \ | ||
| --nsamples_perpair 1 ; |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'perpair' to 'per_pair'
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
…in parakeet inference to test segmentation fault Signed-off-by: Jason <[email protected]>
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| coverage run --branch -a --data-file=/workspace/.coverage --source=/workspace/nemo examples/tts/magpietts.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the failure is passing in --branch to the scripts here. Please update so that it's coverage run -a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for that catch!
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
…; attempt to catch error Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
What does this PR do ?
Updates MagpieTTS with latest dev changes.
Collection: tts
Changelog