Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch extraction for kaldi features #947

Merged
merged 15 commits into from
Jan 19, 2023
Merged

Conversation

desh2608
Copy link
Collaborator

This PR adds extract_batch() method for the kaldi compatible features implemented in Lhotse.

NOTE: We rename Torchaudio's Spectrogram feature to TorchaudioSpectrogram to be consistent with TorchaudioMfcc and TorchaudioFbank.

pzelasko
pzelasko previously approved these changes Jan 18, 2023
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM once the tests pass. You might need to try tweaking some settings, maybe the configurations for both extractors in the test are not identical.

assert feats.shape == (1604, 257)


def test_kaldi_spectrogram_extractor_vs_torchaudio(recording):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is currently failing with the following message:

E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 3874 / 412228 (0.9%)
E       Greatest absolute difference: 22.66506841033697 at index (826, 0) (up to 1e-05 allowed)
E       Greatest relative difference: 17694.401589361405 at index (1029, 0) (up to 0.0001 allowed)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On changing the assertion to torch.testing.assert_allclose(feats, np.exp(feats_ta)), I get:

E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 1604 / 412228 (0.4%)
E       Greatest absolute difference: 8.267975032700633 at index (325, 0) (up to 1e-05 allowed)
E       Greatest relative difference: 0.999999999856555 at index (826, 0) (up to 0.0001 allowed)

1604 is the number of frames, and it seems they only differ in the 0th dimension.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, the first dimension is the log energy for both, so we don't need to exponentiate that for comparing.

@desh2608
Copy link
Collaborator Author

Thanks! LGTM once the tests pass. You might need to try tweaking some settings, maybe the configurations for both extractors in the test are not identical.

@pzelasko Made a small fix to the test and now it's passing. Ready to merge.

@desh2608 desh2608 changed the title Batch extraction for kaldi features [WIP] Batch extraction for kaldi features Jan 18, 2023
@desh2608 desh2608 changed the title [WIP] Batch extraction for kaldi features Batch extraction for kaldi features Jan 18, 2023
pzelasko
pzelasko previously approved these changes Jan 18, 2023
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - please do the honors :)

@desh2608
Copy link
Collaborator Author

@pzelasko I noticed some room for optimization in the batched feature extraction. Before this PR, we have 2 extractors which support batch extraction: kaldifeat and S3PRL. Both of these take a list of tensors and perform padding/collation internally, so we don't need to perform collation before-hand.

In this PR, we add batch extraction method for the nn.Module based Kaldi-compatible extractors (originally implemented by Jesus). These extractors take a batch of padded/collated samples as input. An easy solution is to pad and batch the list of tensors before passing them to the extractor.

However, recall that the extract_batch() method is most often used from CutSet.compute_and_store_features_batch(), which uses PyTorch dataloaders to create batches in the background. Therefore, we add a collate option to this method (set as False by default), which specifies whether to collate the batch of samples. This can be set to True for the extractors which work with batched tensors, so that no additional GPU time is spent on collation.

Another small optimization is to off-load the task of saving the extracted features to a background thread.

I will write some tests for the newly added features if they seem reasonable to you.


futures.append(executor.submit(_save_worker, cuts, features))
progress.update(len(cuts))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that I find weird is that this works even though I have not called future.result() anywhere?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the executor is called as a context manager it blocks the execution on __exit__ until all threads have finished (calls .join())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@pzelasko
Copy link
Collaborator

Sounds good to me! Once you finish I think I will use your changes to build an example how to use Lhotse Shar (so the data will be stored sequentially in shards) for re-storing recordings and then computing the features. I am postponing writing the tutorials for quite a while now but that data format works fantastically in real world settings.

@desh2608
Copy link
Collaborator Author

Sounds good to me! Once you finish I think I will use your changes to build an example how to use Lhotse Shar (so the data will be stored sequentially in shards) for re-storing recordings and then computing the features. I am postponing writing the tutorials for quite a while now but that data format works fantastically in real world settings.

Awesome! I'm mostly done with this PR now, but I will sit on it until tomorrow perhaps and use the new batch extractors on some of my tasks to make sure everything works seamlessly. I will ping you when I think it is ready to review/merge.

@desh2608
Copy link
Collaborator Author

@pzelasko I think this can be merged now.

@pzelasko
Copy link
Collaborator

Thanks!!

@pzelasko pzelasko merged commit 302013d into lhotse-speech:master Jan 19, 2023
@pzelasko pzelasko added this to the v1.13 milestone Jan 20, 2023
pzelasko added a commit that referenced this pull request Feb 1, 2023
The outputs can be numpy arrays due to the logic in
`FeatureExtractor.extract_batch` added in #947
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants