-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch extraction for kaldi features #947
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM once the tests pass. You might need to try tweaking some settings, maybe the configurations for both extractors in the test are not identical.
assert feats.shape == (1604, 257) | ||
|
||
|
||
def test_kaldi_spectrogram_extractor_vs_torchaudio(recording): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is currently failing with the following message:
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 3874 / 412228 (0.9%)
E Greatest absolute difference: 22.66506841033697 at index (826, 0) (up to 1e-05 allowed)
E Greatest relative difference: 17694.401589361405 at index (1029, 0) (up to 0.0001 allowed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On changing the assertion to torch.testing.assert_allclose(feats, np.exp(feats_ta))
, I get:
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 1604 / 412228 (0.4%)
E Greatest absolute difference: 8.267975032700633 at index (325, 0) (up to 1e-05 allowed)
E Greatest relative difference: 0.999999999856555 at index (826, 0) (up to 0.0001 allowed)
1604 is the number of frames, and it seems they only differ in the 0th dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, the first dimension is the log energy for both, so we don't need to exponentiate that for comparing.
@pzelasko Made a small fix to the test and now it's passing. Ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - please do the honors :)
@pzelasko I noticed some room for optimization in the batched feature extraction. Before this PR, we have 2 extractors which support batch extraction: kaldifeat and S3PRL. Both of these take a list of tensors and perform padding/collation internally, so we don't need to perform collation before-hand. In this PR, we add batch extraction method for the However, recall that the Another small optimization is to off-load the task of saving the extracted features to a background thread. I will write some tests for the newly added features if they seem reasonable to you. |
|
||
futures.append(executor.submit(_save_worker, cuts, features)) | ||
progress.update(len(cuts)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that I find weird is that this works even though I have not called future.result()
anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the executor is called as a context manager it blocks the execution on __exit__
until all threads have finished (calls .join()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks!
Sounds good to me! Once you finish I think I will use your changes to build an example how to use Lhotse Shar (so the data will be stored sequentially in shards) for re-storing recordings and then computing the features. I am postponing writing the tutorials for quite a while now but that data format works fantastically in real world settings. |
Awesome! I'm mostly done with this PR now, but I will sit on it until tomorrow perhaps and use the new batch extractors on some of my tasks to make sure everything works seamlessly. I will ping you when I think it is ready to review/merge. |
@pzelasko I think this can be merged now. |
Thanks!! |
The outputs can be numpy arrays due to the logic in `FeatureExtractor.extract_batch` added in #947
This PR adds
extract_batch()
method for the kaldi compatible features implemented in Lhotse.NOTE: We rename Torchaudio's
Spectrogram
feature toTorchaudioSpectrogram
to be consistent withTorchaudioMfcc
andTorchaudioFbank
.