Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where should I get decoder_model_merged file from? #917

Open
abuchnick-aiola opened this issue Sep 2, 2024 · 8 comments
Open

Where should I get decoder_model_merged file from? #917

abuchnick-aiola opened this issue Sep 2, 2024 · 8 comments
Labels
question Further information is requested

Comments

@abuchnick-aiola
Copy link

Question

Hey,
I'm trying to use whisper-web demo with my finetuned model.
After I managed connecting my model to the demo application, I'm getting errors related to this:

https://github.com/xenova/transformers.js/blob/7f5081da29c3f77ee830269ab801344776e61bcb/src/models.js#L771

Basically, when transformers.js tries to load a whisper model, it looks for files called decoder_model_merged.onnx / decoder_model_merged_quantized.onnx / decoder_model_merged_fp16.onnx.
The thing is, that the conversion script didn't create any of these files.
That's how the conversion script output looks like:
image

Please help me figure out what am I missing here.
P.S. After I'll get it to work, I'll be happy to open a PR on whisper-web repository that will enable using local models together with remote (on HF hub) models.
Thanks !

@abuchnick-aiola abuchnick-aiola added the question Further information is requested label Sep 2, 2024
@abuchnick-aiola
Copy link
Author

I think it could be related to: xenova/whisper-web#24

@xenova
Copy link
Collaborator

xenova commented Sep 2, 2024

Can you try with the Transformers.js v3 conversion script?

git clone -b v3 https://github.com/xenova/transformers.js.git
cd transformers.js
pip install -q -r scripts/requirements.txt
python -m scripts.convert --quantize --model_id MODEL_ID_GOES_HERE

@abuchnick-aiola
Copy link
Author

abuchnick-aiola commented Sep 3, 2024

Hey @xenova,
Sure, I'll give it a try.
Does that means I should also update my transformers.js version? It's now at ^2.7.0, according to the package.json of the whisper-web project.
Thanks !

@abuchnick-aiola
Copy link
Author

Hey @xenova and everyone else who will get here,
The problem was with how I ran the conversion script. That's how it should be done:

python -m scripts.convert --quantize --model_id MODEL_ID_GOES_HERE --task automatic-speech-recognition-with-past

@abuchnick-aiola
Copy link
Author

Hey @xenova,
Now I'm getting: 'An error occurred during model execution: "Missing the following inputs: cache_position.'.
What can be the issue ?

@abuchnick-aiola
Copy link
Author

abuchnick-aiola commented Sep 8, 2024

@xenova one last update (in the meantime),
By reverse engineering the onnx-community/whisper-* artifacts you uploaded to HF, we found out that 2 things were causing this issue:

  1. Using the conversion script from v3 branch with a large whisper model led to the cache_position exception I attached above. It seems to have something to do with this being a required parameter in a higher transformers version (as requested in the requirements file of the conversion scripts), that transformers.js or whisper-web (on webgpu branch) code still doesn't take into consideration.
  2. No matter what we tried (both development/v3 branches), onnx conversion of any of the large versions of Whisper just doesn't work. It fails on the ATOL validation, no matter the atol we provided (even when 1 was provided as ATOL). This happened only in the full precision (fp32) conversion, which is required in order to run whisper-webgpu . When we tried lower precision (fp16) in the encoder, we got tons of exclamation marks in the response from the model.

I would love to hear any further feedback from you, as we really want to integrate transformers.js into our codebase, but currently the things above are blockers for us.

Thank you very much for your work !

@decoder-sh-david
Copy link

I am also curious about having more information about the conversion flow - specifically I'd like to know how the timestamped models like this where trained.

I have also run into issues with lots of quantification variants simply not working.

@AvivSham
Copy link

Hey @xenova, Now I'm getting: 'An error occurred during model execution: "Missing the following inputs: cache_position.'. What can be the issue ?

@xenova

Can you please help resolve this issue?
For me if I try to inference the converted whisper model through python it works. however when doing the same through whisper web / transformers js I receive this error message.

The Python inference code (I just changed the model path):

from transformers import AutoProcessor, pipeline
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")
speech_recognition = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
pred = speech_recognition(ds[0]["audio"]["array"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants