llama fp8 - enable non reuse cache flow for fp8 (#64) by HolyFalafel · Pull Request #766 · huggingface/optimum-habana

HolyFalafel · 2024-03-05T14:37:20Z

llama fp8 - enable non reuse cache flow for fp8

remove depracted kv cache fp8 flag

Change-Id: Id76f94a127dee202376e8f27de7b28f58affedae

fixing lm eval

Change-Id: I230fa53e7b49d8bb36397b063f652ba3def84600

remove old quantization mode

Change-Id: I538172f29870311349ed79d928cfacc60fb534e8

Add disk_offload flag that controls device_map=auto. Setting this flag enbales weights offload to disk when cpu memory runs OOM. Add const serialization path flag that gets a path for where to serialize const sections, so if there is no space on device to save all const sections they will be offloaded to disk.

* llama fp8 - enable non reuse cache flow for fp8 remove depracted kv cache fp8 flag Change-Id: Id76f94a127dee202376e8f27de7b28f58affedae * fixing lm eval Change-Id: I230fa53e7b49d8bb36397b063f652ba3def84600 * remove old quantization mode Change-Id: I538172f29870311349ed79d928cfacc60fb534e8

HuggingFaceDocBuilderDev · 2024-03-08T08:21:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss · 2024-03-11T09:48:09Z

@HolyFalafel There is some overlap between this PR and #780, I see the same changes for --const_serialization_path

HolyFalafel · 2024-03-11T10:07:55Z

@HolyFalafel There is some overlap between this PR and #780, I see the same changes for --const_serialization_path

Yes, in order to avoid merge conflicts, I had to base this commit on top of the commit which is relevant to #780.
So we should merge the other one first.

szutenberg · 2024-03-13T10:17:31Z

Hi @HolyFalafel ,

I see that "Code check quality" tests are failing. Could you fix formatting with ruff?
Any ETA to have these changes merged?

regisss · 2024-03-13T10:32:32Z

Hi @HolyFalafel ,

I see that "Code check quality" tests are failing. Could you fix formatting with ruff? Any ETA to have these changes merged?

We need to merge #780 first. I'll try it out with Synapse 1.14 but not sure it will work. If it does I'll merge it, ortherwise we'll need to wait for the release of Synapse 1.15.

Update configuration_utils.py

regisss · 2024-03-22T22:13:34Z

@HolyFalafel We'll merge this PR in the synapse_1.15 branch but there are merge conflicts. I think it's due to the upgrade to Transformers 4.38, can you check please?

…test

…ggingface#2302) (huggingface#766) Co-authored-by: Karol Brejna <karolbrejna@apache.org>

Yantom1 and others added 2 commits March 5, 2024 16:30

HolyFalafel requested review from bhargaveede, libinta, mandy-li, ssarkar2 and vivekgoe as code owners March 5, 2024 14:37

HolyFalafel requested a review from a user March 5, 2024 14:37

HolyFalafel requested a review from regisss as a code owner March 5, 2024 14:37

libinta reviewed Mar 6, 2024

View reviewed changes

Comment thread examples/text-generation/utils.py Outdated

libinta added run-test Run CI for PRs from external contributors synapse 1.15 labels Mar 6, 2024

Remove setup_inference() in utils.py

157634d

libinta approved these changes Mar 8, 2024

View reviewed changes

stylize code

c55cb61

libinta reviewed Mar 15, 2024

View reviewed changes

Comment thread optimum/habana/transformers/generation/configuration_utils.py

Re-add bucket_internal, mistakenly removed from cherry-pick

36c529f

Update configuration_utils.py

libinta approved these changes Mar 18, 2024

View reviewed changes

add ENABLE_CONST_MARKING flag in OH

b6cf31d

regisss changed the base branch from main to synapse_1.15 March 22, 2024 22:12

Merge branch 'synapse_1.15' into dev/dsemiat/llama_non_reuse_cache_la…

9384c26

…test

regisss approved these changes Mar 25, 2024

View reviewed changes

regisss merged commit a6d2b54 into huggingface:synapse_1.15 Mar 25, 2024

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

Remove dead code from modeling_qwen3_moe.py and modeling_qwen2.py (hu…

d98900e

…ggingface#2302) (huggingface#766) Co-authored-by: Karol Brejna <karolbrejna@apache.org>

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 6, 2025

Remove dead code from modeling_qwen3_moe.py and modeling_qwen2.py (hu…

eca8ce2

…ggingface#2302) (huggingface#766) Co-authored-by: Karol Brejna <karolbrejna@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama fp8 - enable non reuse cache flow for fp8 (#64)#766

llama fp8 - enable non reuse cache flow for fp8 (#64)#766
regisss merged 7 commits into
huggingface:synapse_1.15from
HabanaAI:dev/dsemiat/llama_non_reuse_cache_latest

HolyFalafel commented Mar 5, 2024

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 8, 2024

Uh oh!

regisss commented Mar 11, 2024

Uh oh!

HolyFalafel commented Mar 11, 2024

Uh oh!

szutenberg commented Mar 13, 2024

Uh oh!

regisss commented Mar 13, 2024

Uh oh!

Uh oh!

regisss commented Mar 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

HolyFalafel commented Mar 5, 2024

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 8, 2024

Uh oh!

regisss commented Mar 11, 2024

Uh oh!

HolyFalafel commented Mar 11, 2024

Uh oh!

szutenberg commented Mar 13, 2024

Uh oh!

regisss commented Mar 13, 2024

Uh oh!

Uh oh!

regisss commented Mar 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants