Handling invalid audio generations (for DPO) #43

rfejgin · 2025-01-27T20:40:26Z

When T5TTS generates an output audio that is very short (on the order of <1000 samples) this trips up the ASR and speaker similarity calculations. The ASR in particular needs at least two features' (frames) worth of samples.

Here, we detect the ASR error and write out invalid metrics for the entire batch, i.e. a WER/CER of 100%, an SSIM of 0.0, and set the predicted transcript to <INVALID> so that subsequent processing can process this entry as appropriate.

For DPO preference pair creation we skip any record-group that has at least one entry with invalid metrics.

@shehzeen and @paarthneekhara could you review? @paarthneekhara please feel free to merge when ready. Thank you!

When the model generates an output that is very short (less than 2 ASR frames) the ASR and SSIM calculations will error out. We detect the error and invalidate the entire batch, setting WER/CER to 100% and SSIM to 0.0. The transcription is set to "<INVALID">. Note the metrics still written out to the `.metrics` files; they need to be ignored by any subsequent statistics calculations.

1. Skip groups that have any invalid records. 2. Allow the number of records to exactly match the number of audio files (vs requiring it to be strictly smaller). 3. Add `tqdm` to incidatea progress during long loops.

…former' into experimentalt5tts_finalizedtransformer

Refining the handling of invalid entries in DPO preference selection.

github-actions · 2025-01-30T01:35:38Z

beep boop 🤖: 🚨 The following files must be fixed before merge!

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.tts.models.t5tts
nemo/collections/tts/models/t5tts.py:135:0: C0301: Line too long (147/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:157:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:167:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:172:0: C0301: Line too long (123/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:175:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:185:0: C0301: Line too long (123/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:253:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:254:0: C0301: Line too long (160/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:366:0: C0301: Line too long (122/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:382:0: C0301: Line too long (140/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:393:0: C0301: Line too long (146/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:441:0: C0301: Line too long (139/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:447:0: C0301: Line too long (125/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:448:0: C0301: Line too long (145/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:461:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:495:0: C0301: Line too long (158/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:504:0: C0301: Line too long (207/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:507:0: C0301: Line too long (204/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:510:0: C0301: Line too long (131/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:511:0: C0301: Line too long (166/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:519:0: C0301: Line too long (132/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:531:0: C0301: Line too long (131/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:601:0: C0301: Line too long (133/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:603:0: C0301: Line too long (125/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:617:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:618:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:620:0: C0301: Line too long (134/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:621:0: C0301: Line too long (136/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:623:0: C0301: Line too long (131/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:625:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:654:0: C0301: Line too long (165/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:655:0: C0301: Line too long (134/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:709:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:710:0: C0301: Line too long (151/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:712:0: C0301: Line too long (168/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:713:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:730:0: C0301: Line too long (135/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:739:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:751:0: C0301: Line too long (129/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:761:0: C0301: Line too long (158/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:762:0: C0301: Line too long (193/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:769:0: C0301: Line too long (135/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:770:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:794:0: C0301: Line too long (140/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:795:0: C0301: Line too long (127/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:878:0: C0301: Line too long (122/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:885:0: C0301: Line too long (125/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:899:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:919:0: C0301: Line too long (166/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:934:0: C0301: Line too long (120/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:950:0: C0301: Line too long (123/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:952:0: C0301: Line too long (188/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:973:0: C0301: Line too long (123/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:977:0: C0301: Line too long (130/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1081:0: C0301: Line too long (136/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1088:0: C0301: Line too long (146/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1091:0: C0301: Line too long (161/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1094:0: C0301: Line too long (191/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1096:0: C0301: Line too long (155/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1135:0: C0301: Line too long (127/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1148:0: C0301: Line too long (122/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1168:0: C0301: Line too long (154/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1169:0: C0301: Line too long (176/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1172:0: C0301: Line too long (132/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1197:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1198:0: C0301: Line too long (123/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1199:0: C0301: Line too long (127/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1200:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1201:0: C0301: Line too long (143/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1203:0: C0301: Line too long (163/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1208:0: C0301: Line too long (132/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1219:0: C0301: Line too long (206/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1238:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1240:0: C0301: Line too long (122/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1275:0: C0301: Line too long (143/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1276:0: C0301: Line too long (151/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1278:0: C0301: Line too long (165/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:1279:0: C0301: Line too long (173/119) (line-too-long)
nemo/collections/tts/models/t5tts.py:57:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:87:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:101:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/tts/models/t5tts.py:199:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:203:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:214:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:227:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:239:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:264:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:278:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:291:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:301:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:326:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:339:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:358:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:376:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:393:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:428:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:444:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:454:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:556:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:587:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:676:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:692:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:724:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:822:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:865:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:987:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:1026:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:1138:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/tts/models/t5tts.py:1249:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/tts/models/t5tts.py:15:0: W0611: Unused ceil imported from math (unused-import)
nemo/collections/tts/models/t5tts.py:17:0: W0611: Unused import omegaconf (unused-import)
nemo/collections/tts/models/t5tts.py:43:0: W0611: Unused OmegaConf imported from omegaconf (unused-import)
nemo/collections/tts/models/t5tts.py:52:4: W0611: Unused import wandb (unused-import)
************* Module scripts.t5tts.dpo.create_preference_pairs
scripts/t5tts/dpo/create_preference_pairs.py:12:0: C0301: Line too long (140/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:20:0: C0301: Line too long (165/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:50:0: C0301: Line too long (124/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:54:0: C0301: Line too long (120/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:67:0: C0301: Line too long (134/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:68:0: C0301: Line too long (130/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:226:0: C0301: Line too long (166/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:227:0: C0301: Line too long (173/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:228:0: C0301: Line too long (209/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:236:0: C0301: Line too long (128/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:255:0: C0301: Line too long (152/119) (line-too-long)
scripts/t5tts/dpo/create_preference_pairs.py:9:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/t5tts/dpo/create_preference_pairs.py:74:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/t5tts/dpo/create_preference_pairs.py:82:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/t5tts/dpo/create_preference_pairs.py:88:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/t5tts/dpo/create_preference_pairs.py:170:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/t5tts/dpo/create_preference_pairs.py:185:0: C0116: Missing function or method docstring (missing-function-docstring)
scripts/t5tts/dpo/create_preference_pairs.py:243:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 8.75/10

Thank you for improving NeMo's documentation!

rfejgin added 3 commits January 27, 2025 11:05

DPO: changes to preference pair creation

275d51c

1. Skip groups that have any invalid records. 2. Allow the number of records to exactly match the number of audio files (vs requiring it to be strictly smaller). 3. Add `tqdm` to incidatea progress during long loops.

Comment

2524a27

github-actions bot added the TTS label Jan 27, 2025

rfejgin changed the title ~~Experimentalt5tts finalizedtransformer~~ Handling invalid audio generations (for DPO) Jan 27, 2025

rfejgin added 3 commits January 29, 2025 16:09

Merge remote-tracking branch 'paarth/experimentalt5tts_finalizedtrans…

7709f33

…former' into experimentalt5tts_finalizedtransformer

Fix merge issues and a bug

09ab0aa

Refining the handling of invalid entries in DPO preference selection.

Fix merge issues

33a0f3e

rfejgin marked this pull request as ready for review January 30, 2025 01:55

rfejgin assigned paarthneekhara and unassigned paarthneekhara Jan 30, 2025

rfejgin requested a review from paarthneekhara January 30, 2025 01:59

paarthneekhara merged commit d70b903 into paarthneekhara:experimentalt5tts_finalizedtransformer Jan 30, 2025
4 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling invalid audio generations (for DPO) #43

Handling invalid audio generations (for DPO) #43

rfejgin commented Jan 27, 2025 •

edited

Loading

github-actions bot commented Jan 30, 2025

Handling invalid audio generations (for DPO) #43

Handling invalid audio generations (for DPO) #43

Conversation

rfejgin commented Jan 27, 2025 • edited Loading

github-actions bot commented Jan 30, 2025

rfejgin commented Jan 27, 2025 •

edited

Loading