Skip to content

enable internal kv bucket in llama#11

Closed
xt574chen wants to merge 0 commit into
HabanaAI:habana-mainfrom
xt574chen:llama_kv_bucket
Closed

enable internal kv bucket in llama#11
xt574chen wants to merge 0 commit into
HabanaAI:habana-mainfrom
xt574chen:llama_kv_bucket

Conversation

@xt574chen
Copy link
Copy Markdown

What does this PR do?

To enhance throughput in scenarios with long new tokens, break down the KV cache into multiples of the bucket width. Use this to compute attention rather than using the entire KV cache. Add --bucket_size=128 --bucket_internal to the commands to enable the feature.

image

@xt574chen xt574chen requested a review from a user January 30, 2024 08:42
@vivekgoe vivekgoe requested a review from MrGeva January 30, 2024 09:01
@vivekgoe
Copy link
Copy Markdown

@dvarshney-habana @MrGeva since changes are related to Llama inference, please review.

@bhargaveede bhargaveede force-pushed the habana-main branch 2 times, most recently from 9730605 to c1154b2 Compare February 1, 2024 08:54
@puneeshkhanna
Copy link
Copy Markdown

huggingface#658 - please check my latest comments in actual PR.

@xt574chen xt574chen closed this Feb 5, 2024
@xt574chen
Copy link
Copy Markdown
Author

@puneeshkhanna thank you for the comments. This branch has conflicts with the latest main. Please review this feature in another PR #24.

@xt574chen xt574chen deleted the llama_kv_bucket branch February 5, 2024 17:15
astachowiczhabana pushed a commit that referenced this pull request Nov 20, 2024
* [SW-204303] Enable Fp8 flow with INC for sdxl in OH

Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd
sushildubey171 added a commit that referenced this pull request Nov 22, 2024
Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
* [SW-204303] Enable Fp8 flow with INC for sdxl in OH

Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants