Skip to content

Enabling Graphs in Wav2Vec AC training#622

Merged
libinta merged 4 commits into
huggingface:mainfrom
bhargaveede:dev/wav2vec-ac
Jan 23, 2024
Merged

Enabling Graphs in Wav2Vec AC training#622
libinta merged 4 commits into
huggingface:mainfrom
bhargaveede:dev/wav2vec-ac

Conversation

@bhargaveede
Copy link
Copy Markdown

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@bhargaveede bhargaveede requested a review from vivekgoe January 4, 2024 06:20
@bhargaveede bhargaveede requested a review from regisss as a code owner January 4, 2024 06:20
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bhargaveede bhargaveede added the run-test Run CI for PRs from external contributors label Jan 4, 2024
Comment thread tests/baselines/wav2vec2_base.json Outdated
@@ -35,8 +35,8 @@
"learning_rate": 5e-4,
"train_batch_size": 32,
"eval_accuracy": 0.7829483695652174,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the accuracy similar to what we get without HPU graphs? I think there was an issue with 1.12 but I have not tried it with 1.13.

@bhargaveede
Copy link
Copy Markdown
Author

bhargaveede commented Jan 4, 2024

@regisss I was getting below accuracy with HPU graphs
eval_accuracy = 0.7984

@bhargaveede bhargaveede added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 4, 2024
@bhargaveede
Copy link
Copy Markdown
Author

@regisss Can you please review the PR

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 4, 2024

@regisss I was getting below accuracy with HPU graphs eval_accuracy = 0.7984

@bhargaveede Can you update the accuracies in the baseline with the ones you get with HPU graphs?

Another thing would be to run the speech recognition regression test. I know it doesn't use hpu_graphs_for_training but it could still be imapcted since Wav2Vec2 modeling is changed.

@bhargaveede bhargaveede added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 5, 2024
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 5, 2024

@bhargaveede Did you 1.13 or a newer version?
I get the following error with 1.13:

RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_DEVICE GRAPH:: Capture must end on the same stream it began on.

@bhargaveede
Copy link
Copy Markdown
Author

@regisss I tested with 1.14

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Jan 5, 2024

Okay, so let's wait for 1.14 to be released before merging this one then

@bhargaveede bhargaveede added run-test Run CI for PRs from external contributors and removed run-test Run CI for PRs from external contributors labels Jan 23, 2024
@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Jan 23, 2024

latest test shows ***** train metrics *****
epoch = 10.0
max_memory_allocated (GB) = 94.59
memory_allocated (GB) = 17.55
total_memory_available (GB) = 94.62
train_loss = 0.9773
train_runtime = 0:01:48.44
train_samples_per_second = 3006.684
train_steps_per_second = 11.786
[INFO|trainer.py:1508] 2024-01-23 06:58:51,592 >> Using HPU graphs for inference.
[INFO|trainer.py:1528] 2024-01-23 06:58:51,592 >> ***** Running Evaluation *****
[INFO|trainer.py:1530] 2024-01-23 06:58:51,592 >> Num examples = 5888
[INFO|trainer.py:1533] 2024-01-23 06:58:51,592 >> Batch size = 64
100%|██████████| 12/12 [00:01<00:00, 11.58it/s]
***** eval metrics *****
epoch = 10.0
eval_accuracy = 0.8006
eval_loss = 1.0342
eval_runtime = 0:00:09.87
eval_samples_per_second = 596.288
eval_steps_per_second = 1.215
max_memory_allocated (GB) = 94.59
memory_allocated (GB) = 26.17
total_memory_available (GB) = 94.62

@libinta libinta merged commit 1714a65 into huggingface:main Jan 23, 2024
jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024
* Enabling Graphs in Wav2Vec AC training

* Updating gaudi 1 baseline

* Update wav2vec2_base.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants