Add mark_step for llama inference by libinta · Pull Request #875 · huggingface/optimum-habana

libinta · 2024-04-08T21:43:19Z

What does this PR do?

For better memory optimization, add extra mark_step for llama inference.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Libin Tang <litang@habana.ai>

HuggingFaceDocBuilderDev · 2024-04-09T07:18:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com> Co-authored-by: Puneesh Khanna <pkhanna@habana.ai> Co-authored-by: Witold Szczurek <152967125+wszczurekhabana@users.noreply.github.com>

ZhaiFeiyue · 2024-04-09T07:47:18Z

@libinta how much memory can be saved with this PR? is there any perf data that I can refer?

* port llama related changes/optimizations to mistral if applicable. * add mark step as in huggingface#875

* port llama related changes/optimizations to mistral if applicable. * add mark step as in huggingface#875 * add fusedrope optimization for mistral * add fused rope condition back in

Signed-off-by: Puneesh Khanna <pkhanna@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Sayantan Sarkar <supersarkar@gmail.com> Co-authored-by: Puneesh Khanna <pkhanna@habana.ai> Co-authored-by: Witold Szczurek <152967125+wszczurekhabana@users.noreply.github.com>

regisss and others added 12 commits March 29, 2024 23:08

Release: v1.11.0

d9d4fc0

Fix fp8 ci (#852)

4160e9c

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Fix PR #848 (#853)

b0eefc5

Disable safe loading tests in CI (#854)

8ee87de

Update QA example

1c7b2cd

Update Bert large Gaudi1 CI baseline

c87d312

Add warmup for eval (#855)

445be21

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Fix mistral after syn1.15 update (#858)

eaac913

Fp8 merge fix (#863)

58503c5

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Add mark step and inplace residual add in llama model code (#833)

84e8241

Signed-off-by: Puneesh Khanna <pkhanna@habana.ai>

Enable Flash Attention in recompute and causal modes (#21) (#862)

af85fd0

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com> Co-authored-by: Libin Tang <litang@habana.ai>

Add mark_step for inference(Propagate OHF PRs 126, 96, 75)

8cdeee1

libinta requested review from bhargaveede, mandy-li, ssarkar2 and vivekgoe as code owners April 8, 2024 21:43

libinta requested a review from a user April 8, 2024 21:43

libinta requested a review from regisss as a code owner April 8, 2024 21:43

libinta changed the base branch from main to v1.11-release April 8, 2024 21:44

kalyanjk approved these changes Apr 9, 2024

View reviewed changes

regisss changed the base branch from v1.11-release to main April 9, 2024 06:58

Merge branch 'main' into llama_markstep

e06a896

regisss approved these changes Apr 9, 2024

View reviewed changes

regisss merged commit 0b14d8e into main Apr 9, 2024

regisss deleted the llama_markstep branch April 9, 2024 07:31

skaulintel added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Apr 11, 2024

add mark step as in huggingface#875

105ef87

skaulintel added a commit to HabanaAI/optimum-habana-fork that referenced this pull request Apr 12, 2024

Add marksteps to optimize mistral text generation performance (#164)

9114f42

* port llama related changes/optimizations to mistral if applicable. * add mark step as in huggingface#875

This was referenced Jun 12, 2024

Add mark_step only for inference HabanaAI/optimum-habana-fork#126

Merged

Split the graphs to run with flash_attention on 1x HabanaAI/optimum-habana-fork#75

Merged

skavulya mentioned this pull request Feb 13, 2025

DeepSeek_v3 support #1735

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mark_step for llama inference#875

Add mark_step for llama inference#875
regisss merged 13 commits into
mainfrom
llama_markstep

libinta commented Apr 8, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Apr 9, 2024

Uh oh!

ZhaiFeiyue commented Apr 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

libinta commented Apr 8, 2024

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Apr 9, 2024

Uh oh!

ZhaiFeiyue commented Apr 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants