Skip to content

llama fp8 - enable non reuse cache flow for fp8 (#64)#766

Merged
regisss merged 7 commits into
huggingface:synapse_1.15from
HabanaAI:dev/dsemiat/llama_non_reuse_cache_latest
Mar 25, 2024
Merged

llama fp8 - enable non reuse cache flow for fp8 (#64)#766
regisss merged 7 commits into
huggingface:synapse_1.15from
HabanaAI:dev/dsemiat/llama_non_reuse_cache_latest

Conversation

@HolyFalafel
Copy link
Copy Markdown
Contributor

  • llama fp8 - enable non reuse cache flow for fp8

remove depracted kv cache fp8 flag

Change-Id: Id76f94a127dee202376e8f27de7b28f58affedae

  • fixing lm eval

Change-Id: I230fa53e7b49d8bb36397b063f652ba3def84600

  • remove old quantization mode

Change-Id: I538172f29870311349ed79d928cfacc60fb534e8

Yantom1 and others added 2 commits March 5, 2024 16:30
Add disk_offload flag that controls device_map=auto. Setting this flag enbales weights
offload to disk when cpu memory runs OOM.
Add const serialization path flag that gets a path for where to serialize const sections,
so if there is no space on device to save all const sections they will be offloaded to disk.
* llama fp8 - enable non reuse cache flow for fp8

remove depracted kv cache fp8 flag

Change-Id: Id76f94a127dee202376e8f27de7b28f58affedae

* fixing lm eval

Change-Id: I230fa53e7b49d8bb36397b063f652ba3def84600

* remove old quantization mode

Change-Id: I538172f29870311349ed79d928cfacc60fb534e8
@HolyFalafel HolyFalafel requested a review from a user March 5, 2024 14:37
@HolyFalafel HolyFalafel requested a review from regisss as a code owner March 5, 2024 14:37
Comment thread examples/text-generation/utils.py Outdated
@libinta libinta added run-test Run CI for PRs from external contributors synapse 1.15 labels Mar 6, 2024
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 11, 2024

@HolyFalafel There is some overlap between this PR and #780, I see the same changes for --const_serialization_path

@HolyFalafel
Copy link
Copy Markdown
Contributor Author

@HolyFalafel There is some overlap between this PR and #780, I see the same changes for --const_serialization_path

Yes, in order to avoid merge conflicts, I had to base this commit on top of the commit which is relevant to #780.
So we should merge the other one first.

@szutenberg
Copy link
Copy Markdown
Contributor

Hi @HolyFalafel ,

I see that "Code check quality" tests are failing. Could you fix formatting with ruff?
Any ETA to have these changes merged?

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 13, 2024

Hi @HolyFalafel ,

I see that "Code check quality" tests are failing. Could you fix formatting with ruff? Any ETA to have these changes merged?

We need to merge #780 first. I'll try it out with Synapse 1.14 but not sure it will work. If it does I'll merge it, ortherwise we'll need to wait for the release of Synapse 1.15.

Comment thread optimum/habana/transformers/generation/configuration_utils.py
@regisss regisss changed the base branch from main to synapse_1.15 March 22, 2024 22:12
@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Mar 22, 2024

@HolyFalafel We'll merge this PR in the synapse_1.15 branch but there are merge conflicts. I think it's due to the upgrade to Transformers 4.38, can you check please?

@regisss regisss merged commit a6d2b54 into huggingface:synapse_1.15 Mar 25, 2024
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors synapse 1.15

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants