Support speech recognition with whisper models and seq2seq.#704
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
|
@emascarenhas Can you run the following from the root of the repo to make the code style check pass please? |
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Yes this shows no errors now. |
|
@regisss , Let me know if there any additional actions for me. |
libinta
left a comment
There was a problem hiding this comment.
@emascarenhas can you add ci test also?
I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example. |
I'll add the test in |
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@regisss , python ../gaudi_spawn.py |
|
Yeah it's a bit tricky without knowing the structure of the test folder sorry 🙁
|
Perhaps you can use these results for now. I think this is only inference and I can get the numbers for training & inference on gaudi2 later this week, and these can be added subsequently in another PR. |
|
With Synapse 1.14.0 we have to set --preprocessing_num_workers=1 (with 1.13 were were able to set this to 16). I see that you already made this change. Training takes a really long time, so perhaps you can set some of the options so that the run takes less time. Capturing some of the output here: [INFO|trainer.py:726] 2024-02-29 17:20:15,657 >> ***** Running training ***** Result: epoch = 38.46 I eventually got a memory allocation failure on this run in the prediction step. |
|
I also updated the commands in the README and added two things to speed up the runs:
I also decreased the batch size for evaluation as I also noticed an out-of-memory error otherwise. I'm going to add a test in the CI (even though the WER may be bad, I'll find better hyperparameters later) and merge this PR today. |
|
@regisss , Thanks for your help on this! |
regisss
left a comment
There was a problem hiding this comment.
LGTM!
I added regression tests for Gaudi1 and Gaudi2. We can probably find better hyperparameters to get a much better WER but we can do that later.
Yes. Thank you! |
…ace#704) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: edward.mascarenhas <emascare@gaudi-user-hf-1.amr.corp.intel.com>
…ace#704) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: edward.mascarenhas <emascare@gaudi-user-hf-1.amr.corp.intel.com>
…uggingface#704) Signed-off-by: Urszula <urszula.golowicz@intel.com> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
…uggingface#704) Signed-off-by: Urszula <urszula.golowicz@intel.com> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
What does this PR do?
Adds examples of running speech recognition with different datasets and the whisper with seq2seq transformer models.
Before submitting