-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tentative lhotse --> kaldi manifests conversion for multiple channels #962
Conversation
NOTE: isort installation fails on my side causing pre-commit hook to fail. I removed it for now from pre-commit hook. But I can re-add it back later once tests here are fine |
Tagging @jtrmal since he's been working on the Kaldi import/export stuff. @pzelasko We need this option to support the ESPNet2 baseline for the CHiME-challenge (espnet/espnet#4894). My main concern right now is that the exported manifests for multi-channel recording won't be consistent when imported back into Lhotse, but I think most of the current use-case for this is limited to exporting the manifests, so perhaps we can get this checked in? |
Thanks guys! (and nice too see you back here @popcornell :)) Does this PR introduce any regression? As long as there's no regression for the existing cases, I am OK with incosistencies on importing back. |
It shouldn't change anything for the existing single-channel export AFAIK, but let's wait for the tests to confirm that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, LGTM!
@pzelasko Ahaha yeah I am back, amazed on how much lhotse has grown. I think consistency is impossible for CHiME-6 like datasets where you have multi-channel data splitted over multiple single channel files (even for the same device). |
f"ffmpeg -threads 1 -i {source.source} -ar {sampling_rate} " | ||
f"-map_channel 0.0.{channel} -f wav -threads 1 pipe:1 |" | ||
) | ||
if len(source.channels) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this maybe we want to change in future BTW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a comment here to point out limitations. Something like NOTE (@popcornelle): ....
That's right, Kaldi data directories don't really "know" about multi-channel recordings, we could probably work around it by adding some convention like special recording-id suffixes that help group relevant wav.scp lines back into a single Recording, but if nobody needs it today, there's likely no point. BTW don't worry about failing test for python 3.9, it's an RNG flake, we can merge it anyway once the rest finishes. |
As long as tests finish, I'm fine with it. The only long-term issue will be
if there are different conventions for multichannel audio in kaldi recipes
audio files, but let's not worry about that for now
y.
…On Wed, Feb 1, 2023 at 8:12 PM Piotr Żelasko ***@***.***> wrote:
I think consistency is impossible for CHiME-6 like datasets where you have
multi-channel data splitted over multiple single channel files (even for
the same device). These will be dumped as single channel kaldi entries in
wav.scp but they are regarded as multi-channel in lhotse no ?
That's right, Kaldi data directories don't really "know" about
multi-channel recordings, we could probably work around it by adding some
convention like special recording-id suffixes that help group relevant
wav.scp lines back into a single Recording, but if nobody needs it today,
there's likely no point.
BTW don't worry about failing test for python 3.9, it's an RNG flake, we
can merge it anyway once the rest finishes.
—
Reply to this email directly, view it on GitHub
<#962 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX7XY5N4IDZ74PHY7ILWVKYSFANCNFSM6AAAAAAULBRFSI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@popcornell is this ready to merge? |
WOuld be great if you guys could add tests testing the multi-channel setup
y.
…On Thu, Feb 2, 2023 at 2:35 PM Desh Raj ***@***.***> wrote:
@popcornell <https://github.com/popcornell> is this ready to merge?
—
Reply to this email directly, view it on GitHub
<#962 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYX2RSBOATNI5EVPKOVDWVOZ3RANCNFSM6AAAAAAULBRFSI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
}, | ||
} | ||
|
||
|
||
@pytest.mark.xfail(reason="multi file recordings not supported yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtrmal This multi-channel recording test which was expected to fail earlier is now passing.
@desh2608 On my side everything seems to work right now. |
Thanks @popcornell and @desh2608, merging! |
As title says, works with CHiME6 + DiPCo + Mixer6 Speech.
But may break in other circumstances.