Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tentative lhotse --> kaldi manifests conversion for multiple channels #962

Merged
merged 12 commits into from
Feb 3, 2023

Conversation

popcornell
Copy link
Contributor

As title says, works with CHiME6 + DiPCo + Mixer6 Speech.
But may break in other circumstances.

@popcornell
Copy link
Contributor Author

popcornell commented Jan 30, 2023

NOTE: isort installation fails on my side causing pre-commit hook to fail.

I removed it for now from pre-commit hook. But I can re-add it back later once tests here are fine

@desh2608
Copy link
Collaborator

desh2608 commented Feb 1, 2023

Tagging @jtrmal since he's been working on the Kaldi import/export stuff.

@pzelasko We need this option to support the ESPNet2 baseline for the CHiME-challenge (espnet/espnet#4894). My main concern right now is that the exported manifests for multi-channel recording won't be consistent when imported back into Lhotse, but I think most of the current use-case for this is limited to exporting the manifests, so perhaps we can get this checked in?

@pzelasko
Copy link
Collaborator

pzelasko commented Feb 1, 2023

Thanks guys! (and nice too see you back here @popcornell :))

Does this PR introduce any regression? As long as there's no regression for the existing cases, I am OK with incosistencies on importing back.

@pzelasko pzelasko added this to the v1.13 milestone Feb 1, 2023
@desh2608
Copy link
Collaborator

desh2608 commented Feb 1, 2023

Thanks guys! (and nice too see you back here @popcornell :))

Does this PR introduce any regression? As long as there's no regression for the existing cases, I am OK with incosistencies on importing back.

It shouldn't change anything for the existing single-channel export AFAIK, but let's wait for the tests to confirm that.

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM!

@popcornell
Copy link
Contributor Author

@pzelasko Ahaha yeah I am back, amazed on how much lhotse has grown.

I think consistency is impossible for CHiME-6 like datasets where you have multi-channel data splitted over multiple single channel files (even for the same device).
These will be dumped as single channel kaldi entries in wav.scp but they are regarded as multi-channel in lhotse no ?

f"ffmpeg -threads 1 -i {source.source} -ar {sampling_rate} "
f"-map_channel 0.0.{channel} -f wav -threads 1 pipe:1 |"
)
if len(source.channels) == 1:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this maybe we want to change in future BTW

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a comment here to point out limitations. Something like NOTE (@popcornelle): ....

@pzelasko
Copy link
Collaborator

pzelasko commented Feb 1, 2023

I think consistency is impossible for CHiME-6 like datasets where you have multi-channel data splitted over multiple single channel files (even for the same device). These will be dumped as single channel kaldi entries in wav.scp but they are regarded as multi-channel in lhotse no ?

That's right, Kaldi data directories don't really "know" about multi-channel recordings, we could probably work around it by adding some convention like special recording-id suffixes that help group relevant wav.scp lines back into a single Recording, but if nobody needs it today, there's likely no point.

BTW don't worry about failing test for python 3.9, it's an RNG flake, we can merge it anyway once the rest finishes.

@jtrmal
Copy link
Collaborator

jtrmal commented Feb 2, 2023 via email

@desh2608
Copy link
Collaborator

desh2608 commented Feb 2, 2023

@popcornell is this ready to merge?

@jtrmal
Copy link
Collaborator

jtrmal commented Feb 2, 2023 via email

},
}


@pytest.mark.xfail(reason="multi file recordings not supported yet")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtrmal This multi-channel recording test which was expected to fail earlier is now passing.

@popcornell
Copy link
Contributor Author

@desh2608 On my side everything seems to work right now.

@pzelasko
Copy link
Collaborator

pzelasko commented Feb 3, 2023

Thanks @popcornell and @desh2608, merging!

@pzelasko pzelasko merged commit a792260 into lhotse-speech:master Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants