Add integration with gemlite weight only quant#2528
Conversation
Summary:
gemlite Only available with nightly torchao right now (or install from source)
Test Plan:
```
python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --batch-size 1 --input 1024 --output 512 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile —torchao-config gemlite-4-64 --tp-size 1
```
Reviewers:
Subscribers:
Tasks:
Tags:
|
Hi @jerryzh168 What is the release cycle of torchao? I can accept using the torchao nightly version, maybe you can try enabling it in the https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml. What do you think? cc @merrymercy @Ying1123 @ispobock |
we have ~ monthly releases, yeah depend on nightly version would be better for now, and we can update to a stable version a bit later I think |
|
I tried to install the nightly version, do we just want to add a version check here? |
|
@zhyncs I think we can land, it's fine to have this as an experimental feature for now I think, I added a print to ask people to use torchao nightly |
Summary:
gemlite Only available with nightly torchao right now (or install from source)
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Motivation
Modifications
Checklist