-
Notifications
You must be signed in to change notification settings - Fork 29
Support Whisper #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Whisper #45
Conversation
guangy10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chmjkb Can you add a test for Whisper model under optimum-executorch/tests/? The test should be straightforward, we only test two things: 1) export to ET, 2) e2e task using the PTE via HF API.
Once we have the test, @michaelbenayoun can help kick off the CI
|
Hi @guangy10, I rebased and added the test, lmk if that looks good 👍🏻 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| assert encoder_input_ids.shape == torch.Size( | ||
| [1, 80, 3000] | ||
| ), f"Whisper only accepts a log-mel spectrogram of shape [1, 80, 3000], passed shape: {encoder_input_ids.shape}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chmjkb This is the config for whisper-tiny only, right? If I switch to a different variant of whisper, it won't work, e.g. whisper-large-v3. I think we can load this dynamically from preprocessor_config.json?
IIUC, each dim in encoder_input_ids represents [batch_size, feature_size, nb_max_frames]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think whisper large is an exception and it will take in 128 features instead of 80. Will fix that. (The smaller ones should work with 80)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm thinking about getting rid of this assertion. Instead, I could just resolve correct shape when instantiating example_encoder_input_ids (im doing this anyways). If a user passes wrong shape for some reason, then be it.
Also, WhisperEncoder itself raises ValueError when the length of the features is not correct.
WDYT?
| if isinstance(self.full_model, WhisperForConditionalGeneration): | ||
| dynamic_shapes = None | ||
| else: | ||
| # Define dynamic dimension for encoder output sequence length | ||
| encoder_seq_len_dim = torch.export.Dim("encoder_hidden_seq_length", max=self.max_hidden_seq_length) | ||
| dynamic_shapes = { | ||
| "decoder_input_ids": None, | ||
| "encoder_hidden_states": {1: encoder_seq_len_dim}, | ||
| "cache_position": None, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tugsbayasgalan @pianpwk Can we use Dim.AUTO here, to avoid setting dynamic_shapes explicitly for different models? In this case, Whisper would expect all static shapes and T5 would want to set encoder_hidden_seq_length to by dynamic at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep Dim.AuTO would be perfect here. Doing so, you don't need the if/else branching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried changing the code to the following:
dynamic_shapes = {
"decoder_input_ids": None,
"encoder_hidden_states": {1: torch.export.Dim.AUTO},
"cache_position": None
}
Unfortunately when I do that, the export fails with the following error:
RuntimeError: Cannot evaluate the shape upper bound of a dynamic-shaped tensor to a concrete bounded integer. Got tensor spec: TensorSpec(dtype=torch.float32, shape=[1, s0, 384], layout=torch.strided, is_sparse=False, shape_dynamism=1, const=False, requires_grad=True).The upper bound shape we get [1, int_oo, 384], the upper bound stride we get [int_oo, 384, 1]This tensor could either be from 1. a data-dependent operation such as nonzero. Or 2. an input, whose don't have a constraint for the upper bound.Please use export's constrain_as_size() or constrain_as_value() apis and set a concrete upper bound to resolve this.
However doing:
dynamic_shapes = {"input_ids": {1: torch.export.Dim.AUTO}}
for the Whisper encoder seems to work, however it makes the T5 export fail :D
guangy10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it back to your queue. Feel free to ping me back for review on Discord or tag me here.
|
Re-run tests on CI |
|
Looks great! Can you fix the linter by running |
Co-authored-by: Guang Yang <[email protected]>
Hi!
First of all, thanks for taking the time to build this project. I think it is a great step forward towards a greater DX with ExecuTorch!
As per discussions on ET Discord on the React Native ExecuTorch channel, this is a PR that makes it possible to export Whisper using optimum-executorch.