Skip to content

Support speech recognition with whisper models and seq2seq.#704

Merged
regisss merged 22 commits into
huggingface:mainfrom
emascarenhas:main
Mar 2, 2024
Merged

Support speech recognition with whisper models and seq2seq.#704
regisss merged 22 commits into
huggingface:mainfrom
emascarenhas:main

Conversation

@emascarenhas
Copy link
Copy Markdown
Contributor

What does this PR do?

Adds examples of running speech recognition with different datasets and the whisper with seq2seq transformer models.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread examples/speech-recognition/gaudi_config.json Outdated
Comment thread examples/speech-recognition/run_speech_recognition_seq2seq.py Outdated
Comment thread examples/speech-recognition/run_speech_recognition_seq2seq.py Outdated
Comment thread examples/speech-recognition/run_speech_recognition_seq2seq.py Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md
emascarenhas and others added 7 commits February 23, 2024 22:20
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Feb 26, 2024

@emascarenhas Can you run the following from the root of the repo to make the code style check pass please?

pip install -U ruff
make style

@emascarenhas
Copy link
Copy Markdown
Contributor Author

@emascarenhas Can you run the following from the root of the repo to make the code style check pass please?

pip install -U ruff
make style

Yes this shows no errors now.

@emascarenhas
Copy link
Copy Markdown
Contributor Author

@regisss , Let me know if there any additional actions for me.

Copy link
Copy Markdown
Collaborator

@libinta libinta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emascarenhas can you add ci test also?

@emascarenhas
Copy link
Copy Markdown
Contributor Author

@emascarenhas can you add ci test also?

I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Feb 28, 2024

@emascarenhas can you add ci test also?

I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example.

I'll add the test in test_examples.py as this script has become big and hard to apprehend.
What I would need is a new JSON file called whisper_large.json in optimum-habana/tests/baselines such as this one: https://github.com/huggingface/optimum-habana/blob/main/tests/baselines/wav2vec2_large_lv60.json
Basically this compiles hyperparameters (batch size, learning rate), metrics to check (throughput, accuracy, etc) and the training args to pass.
If you cannot run the example on Gaudi1, that's fine and I'll run it.

Comment thread examples/speech-recognition/gaudi_config.json Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
Comment thread examples/speech-recognition/README.md Outdated
emascarenhas and others added 4 commits February 27, 2024 20:24
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
@emascarenhas
Copy link
Copy Markdown
Contributor Author

@emascarenhas can you add ci test also?

I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example.

I'll add the test in test_examples.py as this script has become big and hard to apprehend. What I would need is a new JSON file called whisper_large.json in optimum-habana/tests/baselines such as this one: https://github.com/huggingface/optimum-habana/blob/main/tests/baselines/wav2vec2_large_lv60.json Basically this compiles hyperparameters (batch size, learning rate), metrics to check (throughput, accuracy, etc) and the training args to pass. If you cannot run the example on Gaudi1, that's fine and I'll run it.

@regisss ,
Thanks for the tip.
How do I decide which of the several parameters to specify from among the ones in the README.md. The whisper large has these parameters for training and evaluation, and I'm wondering if I should specify all of them in the extra_arguments in the json file? Is it ok to do only an inference test, or need both inference and training?

python ../gaudi_spawn.py
--world_size 8 --use_mpi run_speech_recognition_seq2seq.py
--model_name_or_path="openai/whisper-large"
--dataset_name="mozilla-foundation/common_voice_11_0"
--dataset_config_name="hi"
--language="hindi"
--train_split_name="train+validation"
--eval_split_name="test"
--gaudi_config_name="Habana/whisper"
--max_steps="5000"
--output_dir="./results/whisper-large-hi"
--per_device_train_batch_size="16"
--gradient_accumulation_steps="2"
--per_device_eval_batch_size="16"
--logging_steps="25"
--learning_rate="1e-5"
--warmup_steps="500"
--evaluation_strategy="steps"
--eval_steps="1000"
--save_strategy="steps"
--save_steps="1000"
--generation_max_length="225"
--preprocessing_num_workers="16"
--length_column_name="input_length"
--max_duration_in_seconds="30"
--text_column_name="sentence"
--freeze_feature_encoder="False"
--gradient_checkpointing
--group_by_length
--bf16
--overwrite_output_dir
--do_train
--do_eval
--predict_with_generate
--use_habana
--use_hpu_graphs_for_inference

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Feb 28, 2024

Yeah it's a bit tricky without knowing the structure of the test folder sorry 🙁
Can you let me know the results you get for the following metrics with the command you gave?

  • train_runtime
  • train_samples_per_second
  • eval_samples_per_second
  • eval_wer

@emascarenhas
Copy link
Copy Markdown
Contributor Author

Yeah it's a bit tricky without knowing the structure of the test folder sorry 🙁 Can you let me know the results you get for the following metrics with the command you gave?

  • train_runtime
  • train_samples_per_second
  • eval_samples_per_second
  • eval_wer

Perhaps you can use these results for now. I think this is only inference and I can get the numbers for training & inference on gaudi2 later this week, and these can be added subsequently in another PR.
***** eval metrics *****
eval_loss = 0.7318
eval_runtime = 0:57:04.04
eval_samples = 2894
eval_samples_per_second = 0.845
eval_steps_per_second = 0.053
eval_wer = 2.2266

@emascarenhas
Copy link
Copy Markdown
Contributor Author

With Synapse 1.14.0 we have to set --preprocessing_num_workers=1 (with 1.13 were were able to set this to 16). I see that you already made this change.

Training takes a really long time, so perhaps you can set some of the options so that the run takes less time.
For example running this: python ../gaudi_spawn.py --world_size 8 --use_mpi run_speech_recognition_seq2seq.py --model_name_or_path=openai/whisper-large --dataset_name=mozilla-foundation/commoset_config_name=hi --language=hindi --train_split_name=train+validation --eval_split_name=test --gaudi_config_name=Habana/whisper --max_steps=1000 --output_dir=/home/t/whisper-large-hi --per_device_train_batch_size=16 --gradient_accumulation_steps=2 --per_device_eval_batch_size=16 --logging_steps=25 --learning_rate=1e-5 --warmup_ste_strategy=steps --eval_steps=500 --save_strategy=steps --save_steps=500 --generation_max_length=225 --preprocessing_num_workers=1 --length_column_name=input_length --mnds=30 --text_column_name=sentence --freeze_feature_encoder=False --gradient_checkpointing --group_by_length --bf16 --overwrite_output_dir --do_train --do_eval --prediuse_habana --use_hpu_graphs_for_inference --use_lazy_mode

Capturing some of the output here:

[INFO|trainer.py:726] 2024-02-29 17:20:15,657 >> ***** Running training *****
[INFO|trainer.py:727] 2024-02-29 17:20:15,657 >> Num examples = 6,540
[INFO|trainer.py:728] 2024-02-29 17:20:15,657 >> Num Epochs = 39
[INFO|trainer.py:729] 2024-02-29 17:20:15,657 >> Instantaneous batch size per device = 16
[INFO|trainer.py:732] 2024-02-29 17:20:15,657 >> Total train batch size (w. parallel, distributed & accumulation) = 256
[INFO|trainer.py:733] 2024-02-29 17:20:15,657 >> Gradient Accumulation steps = 2
[INFO|trainer.py:734] 2024-02-29 17:20:15,657 >> Total optimization steps = 1,000
[INFO|trainer.py:735] 2024-02-29 17:20:15,660 >> Number of trainable parameters = 1,541,384,960

Result:

epoch = 38.46
max_memory_allocated (GB) = 94.59
memory_allocated (GB) = 26.61
total_memory_available (GB) = 94.62
train_loss = 0.0485
train_runtime = 2:35:01.97
train_samples = 6540
train_samples_per_second = 27.521
train_steps_per_second = 0.108

I eventually got a memory allocation failure on this run in the prediction step.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 1, 2024

I also updated the commands in the README and added two things to speed up the runs:

  • a label_features_max_length training argument that enables to specify a padding size for label features defined here, otherwise they have many different shapes that regularly trigger compilations
  • --dataloader_num_workers 8 as data loading is often a bottleneck for image and audio inputs, it helps a lot

I also decreased the batch size for evaluation as I also noticed an out-of-memory error otherwise.

I'm going to add a test in the CI (even though the WER may be bad, I'll find better hyperparameters later) and merge this PR today.

@emascarenhas
Copy link
Copy Markdown
Contributor Author

@regisss , Thanks for your help on this!

@regisss regisss added the run-test Run CI for PRs from external contributors label Mar 1, 2024
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I added regression tests for Gaudi1 and Gaudi2. We can probably find better hyperparameters to get a much better WER but we can do that later.

@regisss regisss merged commit e9a1e57 into huggingface:main Mar 2, 2024
@emascarenhas
Copy link
Copy Markdown
Contributor Author

LGTM!

I added regression tests for Gaudi1 and Gaudi2. We can probably find better hyperparameters to get a much better WER but we can do that later.

Yes. Thank you!

puneeshkhanna pushed a commit to puneeshkhanna/optimum-habana-fork that referenced this pull request Mar 11, 2024
…ace#704)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: edward.mascarenhas <emascare@gaudi-user-hf-1.amr.corp.intel.com>
HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 11, 2024
…ace#704)

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: edward.mascarenhas <emascare@gaudi-user-hf-1.amr.corp.intel.com>
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
…uggingface#704)

Signed-off-by: Urszula <urszula.golowicz@intel.com>
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 6, 2025
…uggingface#704)

Signed-off-by: Urszula <urszula.golowicz@intel.com>
Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants