enable internal kv bucket in llama by xt574chen · Pull Request #11 · HabanaAI/optimum-habana-fork

xt574chen · 2024-01-30T08:42:46Z

What does this PR do?

To enhance throughput in scenarios with long new tokens, break down the KV cache into multiples of the bucket width. Use this to compute attention rather than using the entire KV cache. Add --bucket_size=128 --bucket_internal to the commands to enable the feature.

vivekgoe · 2024-01-31T07:21:17Z

@dvarshney-habana @MrGeva since changes are related to Llama inference, please review.

puneeshkhanna · 2024-02-05T07:53:44Z

huggingface#658 - please check my latest comments in actual PR.

xt574chen · 2024-02-05T17:15:29Z

@puneeshkhanna thank you for the comments. This branch has conflicts with the latest main. Please review this feature in another PR #24.

* [SW-204303] Enable Fp8 flow with INC for sdxl in OH Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd

Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd

* [SW-204303] Enable Fp8 flow with INC for sdxl in OH Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd

xt574chen requested review from bhargaveede, libinta and vivekgoe as code owners January 30, 2024 08:42

xt574chen requested a review from a user January 30, 2024 08:42

vivekgoe requested a review from MrGeva January 30, 2024 09:01

ghost approved these changes Jan 31, 2024

View reviewed changes

bhargaveede force-pushed the habana-main branch 2 times, most recently from 9730605 to c1154b2 Compare February 1, 2024 08:54

xt574chen requested review from mandy-li and ssarkar2 as code owners February 1, 2024 08:54

xt574chen closed this Feb 5, 2024

xt574chen deleted the llama_kv_bucket branch February 5, 2024 17:15

astachowiczhabana pushed a commit that referenced this pull request Nov 20, 2024

[SW-204303] Enable Fp8 flow with INC for sdxl in OH (#11)

8a7f080

* [SW-204303] Enable Fp8 flow with INC for sdxl in OH Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd

sushildubey171 added a commit that referenced this pull request Nov 22, 2024

[SW-204303] Enable Fp8 flow with INC for sdxl in OH (#11)

0a625eb

Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd

xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025

[SW-204303] Enable Fp8 flow with INC for sdxl in OH (#11)

44b572d

* [SW-204303] Enable Fp8 flow with INC for sdxl in OH Change-Id: I5bf82613200b4a1dd08882c3731dab57d28992cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable internal kv bucket in llama#11

enable internal kv bucket in llama#11
xt574chen wants to merge 0 commit into
HabanaAI:habana-mainfrom
xt574chen:llama_kv_bucket

xt574chen commented Jan 30, 2024

Uh oh!

vivekgoe commented Jan 31, 2024

Uh oh!

puneeshkhanna commented Feb 5, 2024

Uh oh!

xt574chen commented Feb 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xt574chen commented Jan 30, 2024

What does this PR do?

Uh oh!

vivekgoe commented Jan 31, 2024

Uh oh!

puneeshkhanna commented Feb 5, 2024

Uh oh!

xt574chen commented Feb 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants