Skip to content

HarryWu99/llm_kvcache_sparsity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM KV Cache Sparsity

Implement some method of LLM KV Cache Sparsity, including:

  1. Efficient Streaming Language Models with Attention Sinks, also called "SinkCache"
  2. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
  3. SnapKV: LLM Knows What You are Looking for Before Generation

To Run

pip install -r requirements.txt
# edit longbench loading method `load_from_disk` in example/test.py
python example/test.py --sparsity_method snapkv

The result file will write to results folder.

Then you can use longbench_eval/eval.py to get the scores.

The core code for KV Cache eviction is in models/kv_clusters.py

Results

Todo

About

Implement some method of LLM KV Cache Sparsity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages