Refactoring LLama Attention and mlp layers by bgoldberg-habana · Pull Request #589 · huggingface/optimum-habana

bgoldberg-habana · 2023-12-08T11:30:29Z

Module for scope linearAllreduce
this change allows better memory consumption and better optimizations in synapse when running llama 70b on deepspeed

Module for scope linearAllreduce this change allows better memory consumption and better optimizations in synapse Change-Id: I3a30a09d6d61aece7ce605bb672e1485d3fbe1cc

HuggingFaceDocBuilderDev · 2023-12-08T11:36:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MrGeva · 2023-12-10T18:34:14Z

LGTM

regisss

I just left a last comment that will be addressed quickly.

Besides, do you have numbers to see the kind of memory that is saved doing this?

bgoldberg-habana · 2023-12-11T09:01:42Z

cmd line -
ENABLE_SYNAPSE_QUANTIZATION=false USE_DEFAULT_QUANT_PARAM=true UPDATE_GRAPH_OUTPUT_MME=false ENABLE_CALC_DYNAMIC_RANGE=false ENABLE_EXPERIMENTAL_FLAGS=true deepspeed --num_gpus 8 run_generation.py --model_name_or_path /mnt/weka/data/pytorch/llama2/Llama-2-70b-hf/ --use_hpu_graphs --use_kv_cache --kv_cache_fp8 --batch_size 50 --fp8 --reuse_cache --trim_logits --n_iterations 5 --attn_softmax_bf16 --limit_hpu_graphs --max_new_tokens 2048 --max_input_tokens 2048

pay attention i'm running already on 1.14 but i don't think the numbers changed much from 1.13

with change -
Throughput (including tokenization) = 1581.191910099665 tokens/second
Number of HPU graphs = 333
Memory allocated = 19.07 GB
Max memory allocated = 49.15 GB
Total memory available = 94.62 GB
Graph compilation duration = 524.4125659640013 seconds

reference
Throughput (including tokenization) = 1257.5571168775869 tokens/second
Number of HPU graphs = 333
Memory allocated = 27.33 GB
Max memory allocated = 87.02 GB
Total memory available = 94.62 GB
Graph compilation duration = 542.6321858290012 seconds

Refactoring LLama Attention and mlp layers

cfea517

Module for scope linearAllreduce this change allows better memory consumption and better optimizations in synapse Change-Id: I3a30a09d6d61aece7ce605bb672e1485d3fbe1cc

bgoldberg-habana requested review from libinta and mandy-li as code owners December 8, 2023 11:30

bgoldberg-habana requested a review from a user December 8, 2023 11:30

bgoldberg-habana requested a review from regisss as a code owner December 8, 2023 11:30

regisss reviewed Dec 8, 2023

View reviewed changes

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py Outdated

Comment thread optimum/habana/transformers/models/modeling_all_models.py Outdated

fix CR comments

0d082b2

bgoldberg-habana requested a review from regisss December 10, 2023 18:36

regisss reviewed Dec 10, 2023

View reviewed changes

Comment thread optimum/habana/transformers/models/llama/modeling_llama.py

fix cr comments

d3782ae

bgoldberg-habana requested a review from regisss December 11, 2023 09:13

regisss merged commit afea217 into main Dec 11, 2023

regisss deleted the scope branch December 11, 2023 13:46

regisss mentioned this pull request Dec 11, 2023

Support for FlashAttention in Llama2 #584

Merged

schoi-habana mentioned this pull request Mar 29, 2024

Update Mixtral-8x7B Optimization #836

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring LLama Attention and mlp layers#589

Refactoring LLama Attention and mlp layers#589
regisss merged 3 commits into
mainfrom
scope

bgoldberg-habana commented Dec 8, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrGeva commented Dec 10, 2023

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

bgoldberg-habana commented Dec 11, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bgoldberg-habana commented Dec 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrGeva commented Dec 10, 2023

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bgoldberg-habana commented Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bgoldberg-habana commented Dec 8, 2023 •

edited

Loading

bgoldberg-habana commented Dec 11, 2023 •

edited

Loading