Skip to content

Conversation

@chmjkb
Copy link
Contributor

@chmjkb chmjkb commented Apr 4, 2025

Hi!
First of all, thanks for taking the time to build this project. I think it is a great step forward towards a greater DX with ExecuTorch!
As per discussions on ET Discord on the React Native ExecuTorch channel, this is a PR that makes it possible to export Whisper using optimum-executorch.

Copy link
Collaborator

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chmjkb Can you add a test for Whisper model under optimum-executorch/tests/? The test should be straightforward, we only test two things: 1) export to ET, 2) e2e task using the PTE via HF API.

Once we have the test, @michaelbenayoun can help kick off the CI

@chmjkb
Copy link
Contributor Author

chmjkb commented Apr 7, 2025

Hi @guangy10, I rebased and added the test, lmk if that looks good 👍🏻

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines 248 to 250
assert encoder_input_ids.shape == torch.Size(
[1, 80, 3000]
), f"Whisper only accepts a log-mel spectrogram of shape [1, 80, 3000], passed shape: {encoder_input_ids.shape}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chmjkb This is the config for whisper-tiny only, right? If I switch to a different variant of whisper, it won't work, e.g. whisper-large-v3. I think we can load this dynamically from preprocessor_config.json?
IIUC, each dim in encoder_input_ids represents [batch_size, feature_size, nb_max_frames]?

Copy link
Contributor Author

@chmjkb chmjkb Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think whisper large is an exception and it will take in 128 features instead of 80. Will fix that. (The smaller ones should work with 80)

Copy link
Contributor Author

@chmjkb chmjkb Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm thinking about getting rid of this assertion. Instead, I could just resolve correct shape when instantiating example_encoder_input_ids (im doing this anyways). If a user passes wrong shape for some reason, then be it.
Also, WhisperEncoder itself raises ValueError when the length of the features is not correct.
WDYT?

Comment on lines 274 to 283
if isinstance(self.full_model, WhisperForConditionalGeneration):
dynamic_shapes = None
else:
# Define dynamic dimension for encoder output sequence length
encoder_seq_len_dim = torch.export.Dim("encoder_hidden_seq_length", max=self.max_hidden_seq_length)
dynamic_shapes = {
"decoder_input_ids": None,
"encoder_hidden_states": {1: encoder_seq_len_dim},
"cache_position": None,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tugsbayasgalan @pianpwk Can we use Dim.AUTO here, to avoid setting dynamic_shapes explicitly for different models? In this case, Whisper would expect all static shapes and T5 would want to set encoder_hidden_seq_length to by dynamic at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep Dim.AuTO would be perfect here. Doing so, you don't need the if/else branching.

Copy link
Contributor Author

@chmjkb chmjkb Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried changing the code to the following:

dynamic_shapes = {
    "decoder_input_ids": None,
    "encoder_hidden_states": {1: torch.export.Dim.AUTO},
    "cache_position": None
}

Unfortunately when I do that, the export fails with the following error:
RuntimeError: Cannot evaluate the shape upper bound of a dynamic-shaped tensor to a concrete bounded integer. Got tensor spec: TensorSpec(dtype=torch.float32, shape=[1, s0, 384], layout=torch.strided, is_sparse=False, shape_dynamism=1, const=False, requires_grad=True).The upper bound shape we get [1, int_oo, 384], the upper bound stride we get [int_oo, 384, 1]This tensor could either be from 1. a data-dependent operation such as nonzero. Or 2. an input, whose don't have a constraint for the upper bound.Please use export's constrain_as_size() or constrain_as_value() apis and set a concrete upper bound to resolve this.

However doing:
dynamic_shapes = {"input_ids": {1: torch.export.Dim.AUTO}}
for the Whisper encoder seems to work, however it makes the T5 export fail :D

Copy link
Collaborator

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it back to your queue. Feel free to ping me back for review on Discord or tag me here.

@chmjkb chmjkb requested a review from guangy10 April 8, 2025 15:18
@guangy10
Copy link
Collaborator

guangy10 commented Apr 9, 2025

Re-run tests on CI

@guangy10
Copy link
Collaborator

guangy10 commented Apr 9, 2025

Looks great! Can you fix the linter by running make style? After that, we can start merging this PR if @michaelbenayoun doesn't have additional comments.

chmjkb and others added 2 commits April 10, 2025 10:52
@guangy10 guangy10 merged commit 5b88123 into huggingface:main Apr 10, 2025
200 of 218 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants