Add mark step and inplace residual add in llama model code to reduce memory consumption#65
Conversation
Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
|
@dvarshney-habana - please review. |
vivekgoe
left a comment
There was a problem hiding this comment.
Please add mark_step calls under lazy mode flag. Same modeling file is used for torch compile mode also where mark_step is not relevant.
|
@MrGeva - You may want to review this. Accuracy seems fine. However I need to address mark step comment from Vivek and also need to check finetuning script. |
|
@mandy-li - this PR is very important from memory usage perspective for llama inference. As an example for the config of BS-172, seq len-2048, hidden dim-8191 (size is ~5.3 GB) for llama-70B on 8x. |
|
@schoi-habana - Can you check finetuning once with this PR ? |
|
@vivekgoe - lazy mode flag and check added. |
LGTM. |
|
In place add is having loss divergence issue while training so updated PR to perform the in place add operation only in inference. Ran below command without any fixes of this PR : Ran below command with the updated changes in this PR (only markstep fix will apply to finetuning) : @libinta , @schoi-habana - FYI. |
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (HabanaAI#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
…memory consumption (#65) * Add mark step and inplace add. Mark step helping in reducing workspace memory by approx twice of (BS,seq len, hidden dim). Inplace add helping in reducing persistent tensors by approc twice of (BS, seq len, hidden dim). Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> * Add lazy mode parameter * Move mark step within the loop * Move mark step before the loop * Fix indentation * update in place add only for inference --------- Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>
Mark step helping in reducing workspace memory by
approx twice of (BS,seq len, hidden dim).
Inplace add helping in reducing persistent tensors by
approx twice of (BS, seq len, hidden dim).
What does this PR do?
Fixes # (issue)
Before submitting