Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing the need for
jsons
dependency #18removing the need for
jsons
dependency #18Changes from 8 commits
17e30a4
46532fc
ad2379b
abcbedd
f584a6c
ebf7b65
7f84e34
b54d828
2c617c2
99d61e0
aef4b97
5fc5fca
389da33
2b0a252
f03d8ca
7c38429
579da0e
8642f1d
6e47bd3
537317f
74db8be
fcf0e82
9f78b36
d95c7a6
968057e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference in speed if FE is performed in CPU vs GPU? This needs to be evaluated before setting the default (for both short and long audios) in batched and sequential cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20s audio
5.51 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # CPU
1.1 ms ± 506 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) # GPU
10min audio
76.3 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # CPU
8.06 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # GPU
around 5x speedup for short audio and 10x for long audio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not whisper's original torch implementation. Update the docstring to specify torchaudio-based FE