[workflow] Word-level forced alignment with pretrained models from Torchaudio #827

pzelasko · 2022-09-28T19:15:23Z

As a follow-up to #824 I'm adding another workflow for forced word-level alignment. It may be interesting to combine the two workflows together. In the end I used the torchaudio model because the tutorial example looks quite convincing with regard to word timestampt accuracy, but I did not evaluate if it's better or worse than any other approach. We can add other workflows for forced alignment later to create some choice.

@desh2608 I also refactored the AlignmentItem thing a bit so that it's NamedTuple again, for the sake of efficiency. I don't like this approach though, after using it for some time. I think we should eventually change it to something like a numpy array (or a collection of them) and maybe not store them in the manifest. Anyway, at least the size of the manifests is greatly reduced with the NamedTuple approach.

desh2608

Super-fast development on this one! LGTM, in general.

Regarding alignments, I agree it's not very efficient. Perhaps treating them in the same way as Features would be a better option.

desh2608 · 2022-09-28T20:07:28Z

Here's one approach off the top of my head. We can have an AlignmentObject that contains 2 things:

A dict which is just a mapping from a symbol (e.g., word, phone) to an int.
A path to a numpy file, containing rows of the form (symbol_idx, start_time, end_time, score).

…ent-workflow' into feature/torchaudio-forced-alignment-workflow

lhotse/bin/modes/workflows.py

csukuangfj · 2022-09-29T15:54:30Z

lhotse/supervision.py

    """

    symbol: str
    start: Seconds
    duration: Seconds
+    score: Optional[float] = None


What does the score indicate?

It's aligner-specific "confidence score", in case of torchaudio they suggested to provide the average token probability for a given segment in their tutorial; but it could be also computed differently. I'll add a proper note.

Co-authored-by: Fangjun Kuang <[email protected]>

…ent-workflow' into feature/torchaudio-forced-alignment-workflow

pzelasko added 2 commits September 28, 2022 15:09

Word-level forced alignment with pretrained models from Torchaudio

b7f5df4

Remove commented out code

ab0326b

pzelasko added this to the v1.8 milestone Sep 28, 2022

pzelasko requested a review from desh2608 September 28, 2022 19:15

Try workaround Python 3.6 issues with NamedTuple

2e1c200

desh2608 previously approved these changes Sep 28, 2022

View reviewed changes

Handle text normalization

dbf82f1

pzelasko dismissed desh2608’s stale review via dbf82f1 September 28, 2022 20:52

pzelasko added 6 commits September 28, 2022 16:57

Handle failed alignments

fc570b5

Fix an issue with multi-supervision cut offsets

60e3e09

Fix for plot_alignments

017bbc0

Fix plot_alignments

528f79f

Adjust the start time by the offset of a cut/supervision

c0541b6

Allow dashes for piping workflows together 🤯

42b9b27

pzelasko marked this pull request as ready for review September 29, 2022 13:44

pzelasko added 4 commits September 29, 2022 09:44

Merge branch 'master' into feature/torchaudio-forced-alignment-workflow

8008e6f

Merge branch 'master' into feature/torchaudio-forced-alignment-workflow

2a07c18

Add tests, clean up the code

21a3029

Merge remote-tracking branch 'origin/feature/torchaudio-forced-alignm…

065d0fe

…ent-workflow' into feature/torchaudio-forced-alignment-workflow

csukuangfj reviewed Sep 29, 2022

View reviewed changes

pzelasko and others added 3 commits September 29, 2022 12:01

Update lhotse/bin/modes/workflows.py

f157edb

Co-authored-by: Fangjun Kuang <[email protected]>

Address review

26f9a0e

Merge remote-tracking branch 'origin/feature/torchaudio-forced-alignm…

2b3acbd

…ent-workflow' into feature/torchaudio-forced-alignment-workflow

pzelasko merged commit 99bbad8 into master Sep 29, 2022

yaozengwei mentioned this pull request Feb 8, 2023

Get alignments using lhotse workflows align-with-torchaudio k2-fsa/icefall#888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workflow] Word-level forced alignment with pretrained models from Torchaudio #827

[workflow] Word-level forced alignment with pretrained models from Torchaudio #827

pzelasko commented Sep 28, 2022 •

edited

Loading

desh2608 left a comment

desh2608 commented Sep 28, 2022

csukuangfj Sep 29, 2022

pzelasko Sep 29, 2022

[workflow] Word-level forced alignment with pretrained models from Torchaudio #827

[workflow] Word-level forced alignment with pretrained models from Torchaudio #827

Conversation

pzelasko commented Sep 28, 2022 • edited Loading

desh2608 left a comment

Choose a reason for hiding this comment

desh2608 commented Sep 28, 2022

csukuangfj Sep 29, 2022

Choose a reason for hiding this comment

pzelasko Sep 29, 2022

Choose a reason for hiding this comment

pzelasko commented Sep 28, 2022 •

edited

Loading