Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with converting Whisper model to ONNX #1040

Open
1 of 5 tasks
AvivSham opened this issue Nov 19, 2024 · 6 comments
Open
1 of 5 tasks

Issue with converting Whisper model to ONNX #1040

AvivSham opened this issue Nov 19, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@AvivSham
Copy link

System Info

Created new env following this requirement file:

transformers[torch]==4.46.1
onnxruntime==1.19.2
optimum==1.23.3
onnx==1.16.2
onnxconverter-common==1.14.0
tqdm==4.66.5
onnxslim==0.1.36
--extra-index-url https://pypi.ngc.nvidia.com
onnx_graphsurgeon==0.3.27

system info:
MAC M2
converting using CPU device

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

We are attempting to convert whisper-small using the HF model openai/whisper-small by executing the command specified in the README file.
python -m scripts.convert --quantize --model_id openai/whisper-small

We get the following trace:

TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  elif len(self.key_cache[layer_idx]) == 0:  # fills previously skipped layers; checking for tensor causes errors
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
	model.decoder.embed_tokens.weight: {'model.decoder.embed_tokens.weight'}
	proj_out.weight: {'onnx::MatMul_3259'}
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
	model.decoder.embed_tokens.weight: {'model.decoder.embed_tokens.weight'}
	proj_out.weight: {'onnx::MatMul_2910'}
		-[x] values not close enough, max diff: 0.024361729621887207 (atol: 0.001)
		-[x] values not close enough, max diff: 6.988886833190918 (atol: 0.001)
		-[x] values not close enough, max diff: 5.208465576171875 (atol: 0.001)
		-[x] values not close enough, max diff: 1.9965003728866577 (atol: 0.001)
		-[x] values not close enough, max diff: 1.4132819175720215 (atol: 0.001)
		-[x] values not close enough, max diff: 0.8667690753936768 (atol: 0.001)
		-[x] values not close enough, max diff: 3.7726752758026123 (atol: 0.001)
		-[x] values not close enough, max diff: 2.159898519515991 (atol: 0.001)
		-[x] values not close enough, max diff: 12.425561904907227 (atol: 0.001)
		-[x] values not close enough, max diff: 1.2728543281555176 (atol: 0.001)
		-[x] values not close enough, max diff: 6.912049770355225 (atol: 0.001)
		-[x] values not close enough, max diff: 1.0248034000396729 (atol: 0.001)
		-[x] values not close enough, max diff: 7.5350022315979 (atol: 0.001)
		-[x] values not close enough, max diff: 1.6307682991027832 (atol: 0.001)
		-[x] values not close enough, max diff: 7.0035505294799805 (atol: 0.001)
		-[x] values not close enough, max diff: 0.8978527784347534 (atol: 0.001)
		-[x] values not close enough, max diff: 5.2730207443237305 (atol: 0.001)
		-[x] values not close enough, max diff: 1.0290248394012451 (atol: 0.001)
		-[x] values not close enough, max diff: 5.59857177734375 (atol: 0.001)
		-[x] values not close enough, max diff: 1.0392111539840698 (atol: 0.001)
		-[x] values not close enough, max diff: 4.692121505737305 (atol: 0.001)
		-[x] values not close enough, max diff: 1.080666184425354 (atol: 0.001)
		-[x] values not close enough, max diff: 2.687824249267578 (atol: 0.001)
		-[x] values not close enough, max diff: 1.6337403059005737 (atol: 0.001)
		-[x] values not close enough, max diff: 2.598097801208496 (atol: 0.001)
		-[x] values not close enough, max diff: 1.6576173305511475 (atol: 0.001)
Validation for the model models/openai/whisper-small/encoder_model.onnx raised: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.001:
- last_hidden_state: max diff = 0.024361729621887207
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.001:
- logits: max diff = 6.988886833190918
- present.0.decoder.key: max diff = 5.208465576171875
- present.0.decoder.value: max diff = 1.9965003728866577
- present.1.decoder.key: max diff = 1.4132819175720215
- present.1.decoder.value: max diff = 0.8667690753936768
- present.2.decoder.key: max diff = 3.7726752758026123
- present.2.decoder.value: max diff = 2.159898519515991
- present.3.decoder.key: max diff = 12.425561904907227
- present.3.decoder.value: max diff = 1.2728543281555176
- present.4.decoder.key: max diff = 6.912049770355225
- present.4.decoder.value: max diff = 1.0248034000396729
- present.5.decoder.key: max diff = 7.5350022315979
- present.5.decoder.value: max diff = 1.6307682991027832
- present.6.decoder.key: max diff = 7.0035505294799805
- present.6.decoder.value: max diff = 0.8978527784347534
- present.7.decoder.key: max diff = 5.2730207443237305
- present.7.decoder.value: max diff = 1.0290248394012451
- present.8.decoder.key: max diff = 5.59857177734375
- present.8.decoder.value: max diff = 1.0392111539840698
- present.9.decoder.key: max diff = 4.692121505737305
- present.9.decoder.value: max diff = 1.080666184425354
- present.10.decoder.key: max diff = 2.687824249267578
- present.10.decoder.value: max diff = 1.6337403059005737
- present.11.decoder.key: max diff = 2.598097801208496
- present.11.decoder.value: max diff = 1.6576173305511475.
 The exported model was saved at: models/openai/whisper-small

None of the layers meet the default tolerance and in most layers, the difference is more than 3 orders of magnitude.
@xenova can you please help with this?

Thanks,

Reproduction

just run:
python -m scripts.convert --quantize --model_id openai/whisper-small

@AvivSham AvivSham added the bug Something isn't working label Nov 19, 2024
@AvivSham AvivSham changed the title Issue with converting whisper model to ONNX Issue with converting Whisper model to ONNX Nov 19, 2024
@xenova
Copy link
Collaborator

xenova commented Nov 21, 2024

Thanks @AvivSham, I am able to reproduce the issue. Same thing happens with other variants of whisper. @echarlaix looks to be an issue with Optimum as I'm able to reproduce with optimum-cli. 👀 Any idea what's up?

@AvivSham
Copy link
Author

bumping...

@xenova
Copy link
Collaborator

xenova commented Nov 28, 2024

@AvivSham In the meantime, can you try downgrade to "transformers_version": "4.38.2",?

@AvivSham
Copy link
Author

AvivSham commented Dec 2, 2024

We downgraded transformers to 4.38.2, and still none of the model versions (small/medium/large) meet the threshold. The trace looks more of the same for all versions:

Found different candidate ONNX initializers (likely duplicate) for the tied weights:
        model.decoder.embed_tokens.weight: {'model.decoder.embed_tokens.weight'}
        proj_out.weight: {'onnx::MatMul_8717'}
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
        model.decoder.embed_tokens.weight: {'model.decoder.embed_tokens.weight'}
        proj_out.weight: {'onnx::MatMul_7406'}
                -[x] values not close enough, max diff: 0.006899833679199219 (atol: 0.001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.001:
- last_hidden_state: max diff = 0.006899833679199219.
 The exported model was saved at: models/openai/whisper-medium

However we see 2 differences:

  1. The validation is less informative (or it is because of the logs) only one layer needs to meet the atol threshold.
  2. The difference from the threshold is much smaller.

@xenova

@xenova
Copy link
Collaborator

xenova commented Dec 2, 2024

@AvivSham Those differences are negligible and the model will produce similar results to the python version!

Looks like we need to investigate what broke in a recent update to transformers. cc @echarlaix

@AvivSham
Copy link
Author

AvivSham commented Dec 2, 2024

@echarlaix @xenova thanks!

Do you know why the logs are less informative? I can recall from optimum that if layers meet the threshold they are also printed with a checkmark sign, here the logs are only for single layer.

I also have a follow-up question which I also asked here - #917 (comment)
When we try to use the converted model with whisper web we get a cache_position related error it also bothers others - #917 (comment)
This issue does not reproduce in python env and you can see in the code snip we added (check the first comment link).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants