Skip to content

Conversation

@jiafatom
Copy link
Contributor

Description

Fix quantization in Whisper model export

Motivation and Context

As titled.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

…o_onnx.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@kunal-vaishnavi
Copy link
Contributor

Since the big models CI pipeline was deleted, we should verify that these commands all still work.

@jiafatom
Copy link
Contributor Author

jiafatom commented Oct 19, 2025

Since the big models CI pipeline was deleted, we should verify that these commands all still work.

I verified a lot of cases work. The issue is --model_impl openai case.
When we use
python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --model_impl openai --output wtiny-fp32-cpu-oai --precision fp32 --provider cpu --overwrite --use_external_data_format --optimize_onnx --no_beam_search_op --output_cross_qk ;
we will see
TypeError: scaled_dot_product_attention(): argument 'is_causal' must be bool, not Tensor

However, I see the opeai-whisper bug mentioned in the script

       # Run forward pass
       # NOTE: There is a bug with openai-whisper==20240930 with the introduction of SDPA.
       # In the Whisper codebase, the following line
       #
       # is_causal = mask is not None and n_ctx > 1
       #
       # has been added where `mask` is a torch tensor. The right-hand side evaluates to `tensor(True/False)`
       # but `is_causal` only accepts the boolean value. The fix is to apply `.item()` after the right-hand
       # side has been evaluated. In other words, the line should be
       #
       # is_causal = (mask is not None and n_ctx > 1).item()
       #
       # instead.

@kunal-vaishnavi
Copy link
Contributor

Since the big models CI pipeline was deleted, we should verify that these commands all still work.

I verified a lot of cases work. The issue is --model_impl openai case. When we use python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --model_impl openai --output wtiny-fp32-cpu-oai --precision fp32 --provider cpu --overwrite --use_external_data_format --optimize_onnx --no_beam_search_op --output_cross_qk ; we will see TypeError: scaled_dot_product_attention(): argument 'is_causal' must be bool, not Tensor

However, I see the opeai-whisper bug mentioned in the script

       # Run forward pass
       # NOTE: There is a bug with openai-whisper==20240930 with the introduction of SDPA.
       # In the Whisper codebase, the following line
       #
       # is_causal = mask is not None and n_ctx > 1
       #
       # has been added where `mask` is a torch tensor. The right-hand side evaluates to `tensor(True/False)`
       # but `is_causal` only accepts the boolean value. The fix is to apply `.item()` after the right-hand
       # side has been evaluated. In other words, the line should be
       #
       # is_causal = (mask is not None and n_ctx > 1).item()
       #
       # instead.

For the --model_impl openai case, can you verify it works with the version of openai-whisper in requirements.txt?

Separately, an internal customer mentioned there were export issues with torch==2.9.0 while export succeeded with torch==2.7.0.

Let's pin the torch version in requirements.txt to 2.7.0.

@jiafatom
Copy link
Contributor Author

python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --model_impl openai --output wtiny-fp32-cpu-oai --precision fp32 --provider cpu --overwrite --use_external_data_format --optimize_onnx --no_beam_search_op --output_cross_qk

Tested with openai-whisper==20240927 and it works!

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

…o_onnx.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

jiafatom and others added 2 commits October 19, 2025 19:32
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@kunal-vaishnavi kunal-vaishnavi merged commit ed7847c into main Oct 20, 2025
95 of 100 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the whisper_quant branch October 20, 2025 18:24
JonathanC-ARM pushed a commit to JonathanC-ARM/onnxruntime that referenced this pull request Oct 24, 2025
### Description
Fix quantization in Whisper model export



### Motivation and Context
As titled.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
fs-eire pushed a commit that referenced this pull request Oct 24, 2025
### Description
Fix quantization in Whisper model export



### Motivation and Context
As titled.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
quic-tirupath pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Oct 27, 2025
### Description
Fix quantization in Whisper model export



### Motivation and Context
As titled.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025
### Description
Fix quantization in Whisper model export



### Motivation and Context
As titled.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants