This repository was archived by the owner on Oct 11, 2024. It is now read-only.

[1/N] Rs/vllm quantization - Refactor to minimize `llama.py` changes#186

Merged

varun-sundar-rabindranath merged 12 commits intovllm-quantizationfrom

rs/vllm-quantization

Apr 16, 2024

Commits on Apr 12, 2024

first end to end run with rowparallellinear in the fun format
Robert Shaw
committed
got qkvproj working with funsqlinearmethod
Robert Shaw
committed
converted all SQLinearMethods to use FunSQLinearMethod -- now working to remove changes to Llama.py
Robert Shaw
committed
stash
Robert Shaw
committed
updated llama.py to minimze changes
Robert Shaw
committed
updated llama.py to minimze changes
Robert Shaw
committed
minimize changes to llama.py
Robert Shaw
committed
minimize changes to llama.py
Robert Shaw
committed
tweak
Robert Shaw
committed

Commits on Apr 13, 2024

updated llama weight loading
Robert Shaw
committed
added TODO
Robert Shaw
committed

Commits on Apr 16, 2024

[2/N] Rs/vllm quantization - Refactor refactor to support non-uniform via config (#188)
robertgshaw2-redhat
authored