Support speech recognition with whisper models and seq2seq. by emascarenhas · Pull Request #704 · huggingface/optimum-habana

emascarenhas · 2024-02-12T04:47:35Z

What does this PR do?

Adds examples of running speech recognition with different datasets and the whisper with seq2seq transformer models.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-02-23T02:26:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

regisss · 2024-02-26T16:53:53Z

@emascarenhas Can you run the following from the root of the repo to make the code style check pass please?

pip install -U ruff
make style

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

emascarenhas · 2024-02-27T00:42:33Z

@emascarenhas Can you run the following from the root of the repo to make the code style check pass please?
pip install -U ruff
make style

Yes this shows no errors now.

emascarenhas · 2024-02-27T21:59:58Z

@regisss , Let me know if there any additional actions for me.

libinta

@emascarenhas can you add ci test also?

emascarenhas · 2024-02-28T00:54:01Z

@emascarenhas can you add ci test also?

I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example.

regisss · 2024-02-28T01:31:32Z

@emascarenhas can you add ci test also?

I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example.

I'll add the test in test_examples.py as this script has become big and hard to apprehend.
What I would need is a new JSON file called whisper_large.json in optimum-habana/tests/baselines such as this one: https://github.com/huggingface/optimum-habana/blob/main/tests/baselines/wav2vec2_large_lv60.json
Basically this compiles hyperparameters (batch size, learning rate), metrics to check (throughput, accuracy, etc) and the training args to pass.
If you cannot run the example on Gaudi1, that's fine and I'll run it.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

emascarenhas · 2024-02-28T06:14:00Z

@emascarenhas can you add ci test also?

I can. Can you provide an example of someone adding it like a specific commit? It will make it lot faster to add it with an example.

I'll add the test in test_examples.py as this script has become big and hard to apprehend. What I would need is a new JSON file called whisper_large.json in optimum-habana/tests/baselines such as this one: https://github.com/huggingface/optimum-habana/blob/main/tests/baselines/wav2vec2_large_lv60.json Basically this compiles hyperparameters (batch size, learning rate), metrics to check (throughput, accuracy, etc) and the training args to pass. If you cannot run the example on Gaudi1, that's fine and I'll run it.

@regisss ,
Thanks for the tip.
How do I decide which of the several parameters to specify from among the ones in the README.md. The whisper large has these parameters for training and evaluation, and I'm wondering if I should specify all of them in the extra_arguments in the json file? Is it ok to do only an inference test, or need both inference and training?

python ../gaudi_spawn.py
--world_size 8 --use_mpi run_speech_recognition_seq2seq.py
--model_name_or_path="openai/whisper-large"
--dataset_name="mozilla-foundation/common_voice_11_0"
--dataset_config_name="hi"
--language="hindi"
--train_split_name="train+validation"
--eval_split_name="test"
--gaudi_config_name="Habana/whisper"
--max_steps="5000"
--output_dir="./results/whisper-large-hi"
--per_device_train_batch_size="16"
--gradient_accumulation_steps="2"
--per_device_eval_batch_size="16"
--logging_steps="25"
--learning_rate="1e-5"
--warmup_steps="500"
--evaluation_strategy="steps"
--eval_steps="1000"
--save_strategy="steps"
--save_steps="1000"
--generation_max_length="225"
--preprocessing_num_workers="16"
--length_column_name="input_length"
--max_duration_in_seconds="30"
--text_column_name="sentence"
--freeze_feature_encoder="False"
--gradient_checkpointing
--group_by_length
--bf16
--overwrite_output_dir
--do_train
--do_eval
--predict_with_generate
--use_habana
--use_hpu_graphs_for_inference

regisss · 2024-02-28T06:35:21Z

Yeah it's a bit tricky without knowing the structure of the test folder sorry 🙁
Can you let me know the results you get for the following metrics with the command you gave?

train_runtime
train_samples_per_second
eval_samples_per_second
eval_wer

emascarenhas · 2024-02-28T15:15:41Z

Yeah it's a bit tricky without knowing the structure of the test folder sorry 🙁 Can you let me know the results you get for the following metrics with the command you gave?

train_runtime

train_samples_per_second

eval_samples_per_second

eval_wer

Perhaps you can use these results for now. I think this is only inference and I can get the numbers for training & inference on gaudi2 later this week, and these can be added subsequently in another PR.
***** eval metrics *****
eval_loss = 0.7318
eval_runtime = 0:57:04.04
eval_samples = 2894
eval_samples_per_second = 0.845
eval_steps_per_second = 0.053
eval_wer = 2.2266

emascarenhas · 2024-03-01T06:08:05Z

With Synapse 1.14.0 we have to set --preprocessing_num_workers=1 (with 1.13 were were able to set this to 16). I see that you already made this change.

Training takes a really long time, so perhaps you can set some of the options so that the run takes less time.
For example running this: python ../gaudi_spawn.py --world_size 8 --use_mpi run_speech_recognition_seq2seq.py --model_name_or_path=openai/whisper-large --dataset_name=mozilla-foundation/commoset_config_name=hi --language=hindi --train_split_name=train+validation --eval_split_name=test --gaudi_config_name=Habana/whisper --max_steps=1000 --output_dir=/home/t/whisper-large-hi --per_device_train_batch_size=16 --gradient_accumulation_steps=2 --per_device_eval_batch_size=16 --logging_steps=25 --learning_rate=1e-5 --warmup_ste_strategy=steps --eval_steps=500 --save_strategy=steps --save_steps=500 --generation_max_length=225 --preprocessing_num_workers=1 --length_column_name=input_length --mnds=30 --text_column_name=sentence --freeze_feature_encoder=False --gradient_checkpointing --group_by_length --bf16 --overwrite_output_dir --do_train --do_eval --prediuse_habana --use_hpu_graphs_for_inference --use_lazy_mode

Capturing some of the output here:

[INFO|trainer.py:726] 2024-02-29 17:20:15,657 >> ***** Running training *****
[INFO|trainer.py:727] 2024-02-29 17:20:15,657 >> Num examples = 6,540
[INFO|trainer.py:728] 2024-02-29 17:20:15,657 >> Num Epochs = 39
[INFO|trainer.py:729] 2024-02-29 17:20:15,657 >> Instantaneous batch size per device = 16
[INFO|trainer.py:732] 2024-02-29 17:20:15,657 >> Total train batch size (w. parallel, distributed & accumulation) = 256
[INFO|trainer.py:733] 2024-02-29 17:20:15,657 >> Gradient Accumulation steps = 2
[INFO|trainer.py:734] 2024-02-29 17:20:15,657 >> Total optimization steps = 1,000
[INFO|trainer.py:735] 2024-02-29 17:20:15,660 >> Number of trainable parameters = 1,541,384,960

Result:

epoch = 38.46
max_memory_allocated (GB) = 94.59
memory_allocated (GB) = 26.61
total_memory_available (GB) = 94.62
train_loss = 0.0485
train_runtime = 2:35:01.97
train_samples = 6540
train_samples_per_second = 27.521
train_steps_per_second = 0.108

I eventually got a memory allocation failure on this run in the prediction step.

regisss · 2024-03-01T10:37:14Z

I also updated the commands in the README and added two things to speed up the runs:

a label_features_max_length training argument that enables to specify a padding size for label features defined here, otherwise they have many different shapes that regularly trigger compilations
--dataloader_num_workers 8 as data loading is often a bottleneck for image and audio inputs, it helps a lot

I also decreased the batch size for evaluation as I also noticed an out-of-memory error otherwise.

I'm going to add a test in the CI (even though the WER may be bad, I'll find better hyperparameters later) and merge this PR today.

emascarenhas · 2024-03-01T15:08:07Z

@regisss , Thanks for your help on this!

regisss

LGTM!

I added regression tests for Gaudi1 and Gaudi2. We can probably find better hyperparameters to get a much better WER but we can do that later.

emascarenhas · 2024-03-02T03:08:41Z

LGTM!

I added regression tests for Gaudi1 and Gaudi2. We can probably find better hyperparameters to get a much better WER but we can do that later.

Yes. Thank you!

…ace#704) Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: edward.mascarenhas <emascare@gaudi-user-hf-1.amr.corp.intel.com>

…uggingface#704) Signed-off-by: Urszula <urszula.golowicz@intel.com> Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>

Support speech recognition with whisper models and seq2seq.

7b772c5

emascarenhas requested a review from regisss as a code owner February 12, 2024 04:47

regisss reviewed Feb 23, 2024

View reviewed changes

emascarenhas and others added 7 commits February 23, 2024 22:20

Update examples/speech-recognition/run_speech_recognition_seq2seq.py

62ddbb4

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/run_speech_recognition_seq2seq.py

f0b413b

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

e98fed7

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

27ff4a0

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

0b4239d

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Suggested reviewer edits

73ea11d

Merge branch 'huggingface:main' into main

15ea3c4

emascarenhas and others added 6 commits February 26, 2024 13:08

Update examples/speech-recognition/run_speech_recognition_seq2seq.py

b596381

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Fix readme toc and pass ruff.

07f91ef

Merge branch 'main' of https://github.com/emascarenhas/optimum-habana

efe9f73

Merge branch 'huggingface:main' into main

3882518

Merge branch 'main' of https://github.com/emascarenhas/optimum-habana

af30ed5

Merge branch 'main' of https://github.com/emascarenhas/optimum-habana

2acc253

emascarenhas closed this Feb 27, 2024

emascarenhas reopened this Feb 27, 2024

libinta reviewed Feb 27, 2024

View reviewed changes

regisss reviewed Feb 28, 2024

View reviewed changes

emascarenhas and others added 4 commits February 27, 2024 20:24

Update examples/speech-recognition/README.md

35d3dbf

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

b1cb614

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

04da3f4

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

5f0060b

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update examples/speech-recognition/README.md

8c671fd

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Update example commands

405ce4b

Add Gaudi2 CI

70545a7

Add Gaudi1 CI

4dc35df

regisss added the run-test Run CI for PRs from external contributors label Mar 1, 2024

regisss approved these changes Mar 2, 2024

View reviewed changes

regisss merged commit e9a1e57 into huggingface:main Mar 2, 2024

Conversation

emascarenhas commented Feb 12, 2024

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Feb 23, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

regisss commented Feb 26, 2024

Uh oh!

emascarenhas commented Feb 27, 2024

Uh oh!

emascarenhas commented Feb 27, 2024

Uh oh!

libinta left a comment

Choose a reason for hiding this comment

Uh oh!

emascarenhas commented Feb 28, 2024

Uh oh!

regisss commented Feb 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emascarenhas commented Feb 28, 2024

Uh oh!

regisss commented Feb 28, 2024

Uh oh!

emascarenhas commented Feb 28, 2024

Uh oh!

emascarenhas commented Mar 1, 2024

Uh oh!

regisss commented Mar 1, 2024

Uh oh!

emascarenhas commented Mar 1, 2024

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

emascarenhas commented Mar 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants