Disabling timers synchronization (#154) by bhargaveede · Pull Request #1879 · huggingface/optimum-habana

bhargaveede · 2025-03-25T03:35:52Z

This change is done to improve some perf. Without this change, Timer synchronization waits on host and that creates little idle time. This change avoids that synchronization resulting in better device utilization.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2025-03-25T03:40:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vidyasiv · 2025-03-26T19:29:31Z

Tested with README command

PT_HPU_MAX_COMPOUND_OP_SIZE=10 \
python3 examples/gaudi_spawn.py --use_deepspeed  --world_size 8  examples/language-modeling/run_lora_clm.py \
  --model_name_or_path meta-llama/Llama-2-70b-hf \
  --deepspeed  examples/language-modeling/llama2_ds_zero3_config.json \
  --dataset_name tatsu-lab/alpaca \
  --bf16 True \
  --output_dir ./lora_out \
  --num_train_epochs 2 \
  --max_seq_len 2048 \
  --per_device_train_batch_size 10 \
  --per_device_eval_batch_size 1 \
  --gradient_checkpointing \
  --eval_strategy epoch \
  --eval_delay 2 \
  --save_strategy no \
  --learning_rate 0.0018 \
  --warmup_ratio 0.03 \
  --lr_scheduler_type "cosine" \
  --logging_steps 1 \
  --dataset_concatenation \
  --attn_softmax_bf16 True \
  --do_train \
  --do_eval \
  --use_habana \
  --use_lazy_mode \
  --pipelining_fwd_bwd \
  --throughput_warmup_steps 3 \
  --lora_rank 4 \
  --lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" \
  --validation_split_percentage 4 \
  --use_flash_attention True \
  --flash_attention_causal_mask True \
  --fp8 True

Results with PR

***** train metrics *****
  epoch                       =        2.0
  max_memory_allocated (GB)   =      93.99
  memory_allocated (GB)       =      17.38
  total_flos                  =  1264185GF
  total_memory_available (GB) =      94.62
  train_loss                  =      0.901
  train_runtime               = 0:27:28.74
  train_samples_per_second    =      3.928
  train_steps_per_second      =      0.049
***** eval metrics *****
  epoch                           =        2.0
  eval_accuracy                   =     0.7915
  eval_graph_compliation_duration =     5.4882
  eval_loss                       =     0.7644
  eval_runtime                    = 0:00:21.41
  eval_samples                    =        125
  eval_samples_per_second         =      6.356
  eval_steps_per_second           =      0.818
  max_memory_allocated (GB)       =      93.99
  memory_allocated (GB)           =      17.38
  perplexity                      =     2.1478
  total_memory_available (GB)     =      94.62

Results on main(without PR)

***** train metrics *****
  epoch                       =        2.0
  max_memory_allocated (GB)   =       94.3
  memory_allocated (GB)       =      17.38
  total_flos                  =  1264185GF
  total_memory_available (GB) =      94.62
  train_loss                  =     0.9119
  train_runtime               = 0:28:09.94
  train_samples_per_second    =      3.823
  train_steps_per_second      =      0.048
***** eval metrics *****
  epoch                           =        2.0
  eval_accuracy                   =     0.7915
  eval_graph_compliation_duration =     5.7394
  eval_loss                       =     0.7638
  eval_runtime                    = 0:00:24.54
  eval_samples                    =        125
  eval_samples_per_second         =      5.376
  eval_steps_per_second           =      0.692
  max_memory_allocated (GB)       =       94.3
  memory_allocated (GB)           =      17.38
  perplexity                      =     2.1464
  total_memory_available (GB)     =      94.62

@bhargaveede thanks for your PR. Could you provide a description of the change and why it is needed?

vidyasiv

@bhargaveede thanks for your PR. Could you provide a description of the change and why it is needed?

bhargaveede · 2025-03-28T04:16:15Z

This change is done to improve some perf. Without this change, Timer synchronization waits on host and that creates little idle time. This change avoids that synchronization resulting in better device utilization.

vidyasiv

@regisss please take a look and let us know if any further testing is needed

regisss

LGTM

Disabling timers synchronization (#154)

3327c79

bhargaveede requested a review from regisss as a code owner March 25, 2025 03:35

Update llama2_ds_zero3_config.json

c6a8025

vidyasiv suggested changes Mar 27, 2025

View reviewed changes

vidyasiv approved these changes Mar 28, 2025

View reviewed changes

karol-brejna-i added the synapse 1.21 label Apr 7, 2025

libinta added the run-test Run CI for PRs from external contributors label Apr 16, 2025

regisss approved these changes Apr 17, 2025

View reviewed changes

regisss merged commit 029f8fb into huggingface:main Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disabling timers synchronization (#154)#1879

Disabling timers synchronization (#154)#1879
regisss merged 2 commits into
huggingface:mainfrom
HabanaAI:auto-pr-5fa4c45

bhargaveede commented Mar 25, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

vidyasiv commented Mar 26, 2025 •

edited

Loading

Uh oh!

vidyasiv left a comment

Uh oh!

bhargaveede commented Mar 28, 2025

Uh oh!

vidyasiv left a comment

Uh oh!

regisss left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

bhargaveede commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2025

Uh oh!

vidyasiv commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vidyasiv left a comment

Choose a reason for hiding this comment

Uh oh!

bhargaveede commented Mar 28, 2025

Uh oh!

vidyasiv left a comment

Choose a reason for hiding this comment

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bhargaveede commented Mar 25, 2025 •

edited

Loading

vidyasiv commented Mar 26, 2025 •

edited

Loading