Add mark step and inplace residual add in llama model code to reduce memory consumption by puneeshkhanna · Pull Request #65 · HabanaAI/optimum-habana-fork

puneeshkhanna · 2024-02-23T11:46:46Z

Mark step helping in reducing workspace memory by
approx twice of (BS,seq len, hidden dim).

Inplace add helping in reducing persistent tensors by
approx twice of (BS, seq len, hidden dim).

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

puneeshkhanna · 2024-02-23T11:57:50Z

@dvarshney-habana - please review.
@libinta - Can you please check finetuning once ?

vivekgoe

Please add mark_step calls under lazy mode flag. Same modeling file is used for torch compile mode also where mark_step is not relevant.

puneeshkhanna · 2024-02-26T14:22:10Z

@MrGeva - You may want to review this. Accuracy seems fine. However I need to address mark step comment from Vivek and also need to check finetuning script.

puneeshkhanna · 2024-02-26T15:29:59Z

@mandy-li - this PR is very important from memory usage perspective for llama inference.

As an example for the config of BS-172, seq len-2048, hidden dim-8191 (size is ~5.3 GB) for llama-70B on 8x.
Max memory usage (without flash attention) reduced from ~86 Gb to ~66Gb.
Max memory usage (with flash attention) reduced from ~70 GB to ~59GB.

puneeshkhanna · 2024-02-26T15:31:49Z

@schoi-habana - Can you check finetuning once with this PR ?

puneeshkhanna · 2024-02-27T08:26:31Z

@vivekgoe - lazy mode flag and check added.

vivekgoe · 2024-02-28T07:57:13Z

@vivekgoe - lazy mode flag and check added.

LGTM.

puneeshkhanna · 2024-02-29T11:04:12Z

In place add is having loss divergence issue while training so updated PR to perform the in place add operation only in inference.

Ran below command without any fixes of this PR :
PT_HPU_MAX_COMPOUND_OP_SIZE=10 DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py --model_name_or_path /mnt/weka/data/pytorch/llama2/Llama-2-70b-hf --deepspeed llama2_ds_zero3_config.json --dataset_name tatsu-lab/alpaca --bf16 True --output_dir ./lora_out --num_train_epochs 1 --max_seq_len 2048 --per_device_train_batch_size 10 --per_device_eval_batch_size 10 --gradient_checkpointing --evaluation_strategy epoch --eval_delay 2 --save_strategy no --learning_rate 0.0018 --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --dataset_concatenation --attn_softmax_bf16 True --do_train --do_eval --use_habana --use_lazy_mode --pipelining_fwd_bwd --throughput_warmup_steps 3 --lora_rank 4 --lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" --validation_split_percentage 4 --use_flash_attention True
{
"epoch": 1.0,
"eval_accuracy": 0.791171470444553,
"eval_loss": 0.7647133469581604,
"eval_runtime": 27.0564,
"eval_samples": 125,
"eval_samples_per_second": 4.62,
"eval_steps_per_second": 0.074,
"max_memory_allocated (GB)": 81.61,
"memory_allocated (GB)": 26.91,
"perplexity": 2.148378447163791,
"total_memory_available (GB)": 94.62,
"train_loss": 0.8714751173288394,
"train_runtime": 1321.3399,
"train_samples_per_second": 2.628,
"train_steps_per_second": 0.033
}

Ran below command with the updated changes in this PR (only markstep fix will apply to finetuning) :
PT_HPU_MAX_COMPOUND_OP_SIZE=10 DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py --model_name_or_path /mnt/weka/data/pytorch/llama2/Llama-2-70b-hf --deepspeed llama2_ds_zero3_config.json --dataset_name tatsu-lab/alpaca --bf16 True --output_dir ./lora_out --num_train_epochs 1 --max_seq_len 2048 --per_device_train_batch_size 10 --per_device_eval_batch_size 10 --gradient_checkpointing --evaluation_strategy epoch --eval_delay 2 --save_strategy no --learning_rate 0.0018 --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --dataset_concatenation --attn_softmax_bf16 True --do_train --do_eval --use_habana --use_lazy_mode --pipelining_fwd_bwd --throughput_warmup_steps 3 --lora_rank 4 --lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" --validation_split_percentage 4 --use_flash_attention True
{
"epoch": 1.0,
"eval_accuracy": 0.7912496336101612,
"eval_loss": 0.7647190690040588,
"eval_runtime": 26.8381,
"eval_samples": 125,
"eval_samples_per_second": 4.658,
"eval_steps_per_second": 0.075,
"max_memory_allocated (GB)": 81.58,
"memory_allocated (GB)": 26.91,
"perplexity": 2.148390740319044,
"total_memory_available (GB)": 94.62,
"train_loss": 0.8714751173288394,
"train_runtime": 1319.6586,
"train_samples_per_second": 2.672,
"train_steps_per_second": 0.034
}

@libinta , @schoi-habana - FYI.
log_lora_without_fixes.txt
log_lora_with_fixes.txt

…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

…memory consumption (HabanaAI#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

astachowiczhabana · 2024-06-07T14:15:26Z

huggingface#833

Add mark step and inplace add.

389938a

Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

puneeshkhanna requested review from libinta and mandy-li as code owners February 23, 2024 11:46

puneeshkhanna requested a review from a user February 23, 2024 11:46

ghost approved these changes Feb 24, 2024

View reviewed changes

vivekgoe suggested changes Feb 26, 2024

View reviewed changes

vivekgoe requested a review from hlahkar February 26, 2024 11:56

Merge branch 'HabanaAI:habana-main' into llama_prefill_memoryfixes

69b5b32

Add lazy mode parameter

094f0b7

puneeshkhanna requested review from bhargaveede and ssarkar2 as code owners February 27, 2024 08:24

Move mark step within the loop

586e8ce

puneeshkhanna mentioned this pull request Feb 28, 2024

Split the graphs to run with flash_attention on 1x #75

Merged

vivekgoe approved these changes Feb 28, 2024

View reviewed changes

Puneesh Khanna added 2 commits February 28, 2024 13:45

Move mark step before the loop

f34fae3

Fix indentation

6c6e5da

update in place add only for inference

8eab266

ghost merged commit 725a6a3 into HabanaAI:habana-main Feb 29, 2024

schoi-habana mentioned this pull request Apr 8, 2024

Update Mixtral-8x7B Optimization huggingface/optimum-habana#836

Closed

astachowiczhabana pushed a commit that referenced this pull request Jan 17, 2025

Fix graph breaks in Mixtral (#65)

9c0fbdc

Solaryee added a commit that referenced this pull request Jan 20, 2025

Fix graph breaks in Mixtral (#65)

bdc7332

ugolowic pushed a commit that referenced this pull request Feb 20, 2025

Fix graph breaks in Mixtral (#65) (huggingface#1705)

6a520ff

xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025

Fix graph breaks in Mixtral (#65)

c3ec015

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mark step and inplace residual add in llama model code to reduce memory consumption#65

Add mark step and inplace residual add in llama model code to reduce memory consumption#65
7 commits merged into
HabanaAI:habana-mainfrom
puneeshkhanna:llama_prefill_memoryfixes

puneeshkhanna commented Feb 23, 2024

Uh oh!

puneeshkhanna commented Feb 23, 2024

Uh oh!

vivekgoe left a comment

Uh oh!

puneeshkhanna commented Feb 26, 2024 •

edited

Loading

Uh oh!

puneeshkhanna commented Feb 26, 2024

Uh oh!

puneeshkhanna commented Feb 26, 2024

Uh oh!

puneeshkhanna commented Feb 27, 2024

Uh oh!

vivekgoe commented Feb 28, 2024

Uh oh!

puneeshkhanna commented Feb 29, 2024

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

puneeshkhanna commented Feb 23, 2024

What does this PR do?

Before submitting

Uh oh!

puneeshkhanna commented Feb 23, 2024

Uh oh!

vivekgoe left a comment

Choose a reason for hiding this comment

Uh oh!

puneeshkhanna commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puneeshkhanna commented Feb 26, 2024

Uh oh!

puneeshkhanna commented Feb 26, 2024

Uh oh!

puneeshkhanna commented Feb 27, 2024

Uh oh!

vivekgoe commented Feb 28, 2024

Uh oh!

puneeshkhanna commented Feb 29, 2024

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

puneeshkhanna commented Feb 26, 2024 •

edited

Loading